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NOVEL GENE ENCODING A DNA REPAIR ENDONUCLEASE 
AND METHODS OF USE THEREOF 

FIELD OF THE INVENTION 

This invention relates to the field of DNA repair. 
Specifically, a novel human gene and its encoded 
endonuclease are disclosed. The gene may be used 
beneficially as a marker for genetic screening, 
mutational analysis and for assessing drug resistance in 
transformed cells. 

5 BACKGROUND OF THE INVENTION 

Several publications are referenced in this 
application in order to more fully describe the state of 
the art to which this invention pertains. The 
disclosure of each of these publications is incorporated 

10 by reference herein. 

Mismatch repair stabilizes the cellular genome by 
correcting DNA replication errors and by blocking 
recombination events between divergent DNA sequences. 
The mechanism responsible for strand-specific correction 

15 of mispaired bases has been highly conserved during 
evolution. Eukaryotic homologs of bacterial MutS and 
MutL, which are believed to play key roles in mismatch 
repair recognition and initiation of repair, have been 
identified in yeast and mammalian cells. Inactivation 

20 of genes encoding these activities results in large 
increases in spontaneous mutability, and in the case of 
humans and rodents, predisposition to tumor development. 

Lynch syndrome or hereditary nonpolyposis colon 
cancer (HNPCC) is an autosomal dominant disease, which 

25 accounts for approximately 1-5% of all colorectal cancer 
cases. In this syndrome, colorectal tumors are 
frequently associated with extracolonic malignancies, 
such as cancers of the endometrium, stomach, ovary, 
brain, skin and urinary tract. Tumors from HNPCC 

30 patients harbor a genome-wide DNA replication/repair 
defect. Due to the lack of pathognomonic morphological 
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or biomolecular markers, HNPCC has traditionally posed 
unique problems to clinicians and geneticists alike, 
both in terms of diagnosis and clinical management. 
Recent breakthroughs in molecular biology have 
5 partially elucidated the pathogenic mechanism of this 
syndrome, Germline mutations in any one of five genes 
encoding proteins that participate in a specialized DNA 
mismatch repair system give rise to a predisposition for 
cancer development in HNPCC families. Patients affected 

10 by HNPCC carry these mutations in genes which are 
involved in DNA mismatch repair. The DNA mismatch 
repair mechanism contributes to mutational avoidance and 
genetic stability, thus performing a tumor suppressor 
function. Loss or inactivation of the wild type allele 

15 in somatic cells leads to a dramatic increase of the 
spontaneous mutation rate. This, in turn, results in 
the accumulation of mutations in other tumor suppressor 
genes and oncogenes, ultimately leading to neoplastic 
transformation . 

20 Microsatellites are repeating sequences that are 

distributed throughout the human genome, most commonly 
(A)n/(T)n and (CA)n/(GT)n. Their function is unknown, 
but they are useful in genetic linkage studies because 
of their high degree of polymorphism and normally stable 

25 inheritance. Several of the genes responsible for HNPCC 
have been identified using analysis of mutation rate in 
DNA microsatellites. Mutations of mismatch repair genes 
can be detected in a subset of sporadic colonic and 
extracolonic cancers which exhibit variability in the 

30 length of microsatellite sequences. This variability is 
often referred to as microsatellite instability. 

Investigators in the field (Peltomaki et al. , 
(1993) Science 260:810-812) have discovered that most 
colorectal cancers from HNPCC patients show 

35 microsatellite instability. These studies revealed that 
the length of microsatellite DNA at different loci 
varies between tumor DNA and non-tumor DNA from the same 



2 



WO 99/04626 



PCT/US98/15828 



patient. The phrase "replication error positive" 
(RER+) has been used to describe such tumors. It should 
be noted that only about 70% of HNPCC cases and only 
about 65% of sporadic tumors with microsatellite 
5 instability carry mutations in the known mismatch repair 
genes (hMSH2, hMLHl, hPMS2, hMSH6 and hPMSl) (Liu et 
al., (1996; Nature Medicine 2:169-174). The remaining 
30-35% of the cases have an as yet unidentified mismatch 
repair genetic defect. Thus, there is a pressing need 

10 to identify the other active components in the DNA 
mismatch repair pathway, as mutations in these genes may 
result in an increased propensity for cancer. 

The Fragile X or Martin Bell syndrome is the most 
common single recognized form of inherited mental 

15 retardation. Fifty percent of all X-linked mental 
retardation may be attributable to the Fragile X 
syndrome. The disorder is found in all ethnic groupings 
with a frequency of 0.3-1 per 1000 males and 0.2-0.6 per 
1000 females. The full clinical syndrome, which is 

20 found in approximately 60% of affected males, consists 
of moderate mental retardation with an IQ typically in 
the range 35-50, elongated facies with large everted 
ears, and macroorchidism. This syndrome is unusual in 
that it is associated with the appearance of a fragile 

25 site on the long arm of the X chromosome at Xq27.3 
(Sutherland, G.R. , (1977) Science 197:256-266). This 
can be visualized cytogenetically in metaphase 
chromosomes prepared from lymphocytes of affected 
individuals which have been cultured under conditions of 

30 folate deficiency or thymidine stress. The study of the 
segregation of polymorphic markers within fragile X 
families has confirmed that the mutation lies in the 
same region of the X-chromosome as that exhibiting 
cytogenetic fragility. 

35 There is an imbalance of penetrance of the 

phenotype associated with this syndrome in the different 
generations of kindreds in which the mutation is 
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segregating. The likelihood of developing mental 
impairment depends on an individual's position in the 
pedigree. As the mutation progresses through the 
generations, the risk of mental impairment increases. 
5 These observations are not consistent with classical X 
linkage and are collectively known as the Sherman 
paradox. Hypotheses based on these observations have 
suggested that the mutation exists in two forms- a 
premutation and a full mutation form. Nonpenetrant 
10 individuals are said to carry a premutation chromosome, 
that is a chromosome which has no abnormal phenotypic 
effect but which is capable of progressing to a fully 
penetrant mutation on passage through a female 
oogenesis. 

15 Two alterations in the DNA at the fragile X site 

have been identified: abnormal amplification of a CpG- 
rich DNA sequence (a CpG island) and hypermethylation of 
such sequences. The molecular basis of the 

amplification is the expansion of a CGG triplet 

20 microsatellite into large arrays. In individuals 
expressing the full clinical phenotype, the DNA in this 
region becomes hypermethylated, leading to the 
transcriptional shut down of the gene FMR-1 (fragile X 
mental retardation 1) which is transcribed across this 

25 region. It is the loss of gene expression that is 
thought to account for the clinical phenotype. It has 
been postulated that in Fragile X syndrome, expansion of 
the (CGG) n repeat from premutation to full mutation may 
be related to an aberrant (misdirected) DNA mismatch 

3 0 repair event. This may be favored by the transient lack 
of multiple methyl signals in the CGG repeat as well as 
in flanking single copy sequences during early stages of 
embryonal development. Similar to Fragile X syndrome, 
defective DNA mismatch repair may play a role in the 

35 expansion of triplet repeats associated with several 
disorders such as myotonic dystrophy, Huntington's 
disease, spino-cerebellar ataxias and Kennedy's disease. 
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The isolation of nucleic acids and proteins which 
when mutated give rise to these various disorders 
enables the development of diagnostic and prognostic 
kits for assessing patients at risk. The biochemical 
5 characterization of the genes encoding the components of 
the DNA mismatch repair system may ultimately facilitate 
gene replacement therapies for use in the treatment of 
malignancy and other inherited genetic disorders. 

10 SUMMARY OP THE INVENTION 

This invention provides novel, biological molecules 
useful for identification, detection, and/or regulation 
of components in the complex DNA recognition/repair 
pathway- According to one aspect of the invention, an 

15 isolated nucleic acid molecule is provided which 
includes a sequence encoding an endonuclease protein of 
a size between about 60 and 75 kilodaltons. The encoded 
protein, referred to herein as MED1 (methyl-CpG binding 
endonuclease 1) comprises a tripartite structure 

20 including an amino terminal methyl-CpG binding domain 
with significant homology to the rat protein, MeCP2 and 
the human protein, PCM1, a central region rich in 
positively-charged amino acids which contains nuclear 
localization signals, and a carboxy terminal catalytic 

25 domain which shares homology with several bacterial 
endonucleases involved in DNA repair. The protein 
demonstrates significant binding affinity for hMLHl and 
mMLH2. In a preferred embodiment of the invention, an 
isolated nucleic acid molecule is provided that includes 

30 a cDNA encoding a human endonuclease protein MED1. In 
a particularly preferred embodiment, the human 
endonuclease protein has an amino acid sequence the same 
as Sequence I.D. No. 2. An exemplary nucleic acid 
molecule of the invention comprises Sequence I.D. No. 1. 

35 According to another aspect of the present 

invention, an isolated nucleic acid molecule is 
provided, which has a sequence selected from the group 
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consisting of: (1) Sequence I.D. No. 1; (2) a sequence 
specifically hybridizing with preselected portions or 
all of the complementary strand of Sequence I.D. No. 1; 
a sequence encoding preselected portions of Sequence 
5 I.D. No. 1, (3) a sequence encoding part or all of a 
polypeptide having amino acid Sequence I.D. No. 2. Such 
partial sequences are useful as probes to identify and 
isolate homologues of the endonuclease gene of the 
invention. Accordingly, isolated nucleic acid sequences 

10 encoding natural allelic variants of Sequence I.D. No. 
1 are also contemplated to be within the scope of the 
present invention. The term natural allelic variants 
will be defined hereinbelow. 

In yet another embodiment of the invention, 

15 isolated genomic DNA molecules are provided which encode 
the Med-1 protein of the invention. These nucleic acids 
(SEQ ID NO: 21 and 22) may be used to advantage in 
screening assays which identify germline and somatic 
mutations in the DNA encoding Med-1. 

20 The present invention also provides MED1 genomic 

nucleic acid of mouse or human origin having a sequence 
substantially the same as that contained in phage stocks 
as deposited on 28 July 1998 at the American Type 
Culture Collection, 10801 University Blvd, Manassas, 

25 Virinia 20110-2209 USA, under the terms of the Budapest 
Treaty with accession number: not yet assigned. 

MED1 polypeptide may conveniently be obtained by 
introducing expression vectors into host cells in which 
the vector is functional, culturing the host cells so 

3 0 that the MED1 polypeptide is produced and recovering the 
MEDl polypeptide from the host cells or the surrounding 
medium. Vectors comprising nucleic acid according to the 
present invention and host cells comprising such vectors 
or nucleic acid form further aspects of the present 

35 invention. 

According to another aspect of the present 
invention, an isolated human endonuclease protein is 
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provided which has a deduced molecular weight of between 
about 60 kDa and 75 kDa. The protein comprises an 
amino-terminal methyl-CpG binding domain with 
significant homology to the rat protein MeCP2 and the 
5 human protein PCM1, a central region rich in positively- 
charged amino acids which contains nuclear localization 
signals, and a carboxy terminal catalytic domain which 
shares homology with several bacterial endonucleases 
involved in DNA repair. In a preferred embodiment of 

10 the invention, the protein is of human origin, and has 
an amino acid sequence the same as Sequence I.D. No* 2. 
In a further embodiment the protein may be encoded by 
natural allelic variants of Sequence I.D. No. 1. 
Inasmuch as certain amino acid variations may be present 

15 in a MED1 protein encoded by a natural allelic variant, 
such proteins are also contemplated to be within the 
scope of the invention. 

According to another aspect of the present 
invention, antibodies immunologically specific for the 

20 proteins described hereinabove are provided. 

Various terms relating to the biological 
molecules of the present invention are used hereinabove 
and also throughout the specifications and claims. The 
terms "specifically hybridizing," "percent similarity" 

25 and "percent identity (identical)" are defined in detail 
in the description set forth below. 

With reference to nucleic acids of the 
invention, the term "isolated nucleic acid" is sometimes 
used. This term, when applied to DNA, refers to a DNA 

30 molecule that is separated from sequences with which it 
is immediately contiguous (in the 5' and 3' directions) 
in the naturally occurring genome of the organism from 
which it originates. For example, the "isolated nucleic 
acid" may comprise a DNA or cDNA molecule inserted into 

35 a vector, such as a plasmid or virus vector, or 
integrated into the DNA of a prokaryote or eukaryote. 

With respect to RNA molecules of the 

7 
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invention, the term "isolated nucleic acid" primarily 
refers to an RNA molecule encoded by an isolated DNA 
molecule as defined above. Alternatively, the term may 
refer to an RNA molecule that has been sufficiently 
5 separated from RNA molecules with which it would be 
associated in its natural state (i.e., in cells or 
tissues) , such that it exists in a "substantially pure" 
form (the term "substantially pure" is defined below) . 

With respect to protein, the term "isolated 

10 protein" or "isolated and purified protein" is sometimes 
used herein. This term refers primarily to a protein 
produced by expression of an isolated nucleic acid 
molecule of the invention. Alternatively, this term may 
refer to a protein which has been sufficiently separated 

15 from other proteins with which it would naturally be 
associated, so as to exist in "substantially pure" form. 

The term "substantially pure" refers to a 
preparation comprising at least 50-60% by weight the 
compound of interest (e.g., nucleic acid, 

20 oligonucleotide, protein, etc.). More preferably, the 
preparation comprises at least 75% by weight, and most 
preferably 90-99% by weight, the compound of interest. 
Purity is measured by methods appropriate for the 
compound of interest (e.g. chromatographic methods, 

25 agarose or polyacrylamide gel electrophoresis, HPLC 
analysis, and the like) . 

With respect to antibodies of the invention, 
the term "immunologically specific" refers to antibodies 
that bind to one or more epitopes of a protein of 

30 interest (e.g., MED1) , but which do not substantially 
recognize and bind other molecules in a sample 
containing a mixed population of antigenic biological 
molecules. 

With respect to oligonucleotides, the term 
35 "specifically hybridizing" refers to the association 
between two single-stranded nucleotide molecules of 
sufficiently complementary sequence to permit such 
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hybridization under pre-determined conditions generally 
used in the art (sometimes termed "substantially 
complementary"). In particular, the term refers to 
hybridization of an oligonucleotide with a substantially 
5 complementary sequence contained within a single- 
stranded DNA or RNA molecule of the invention, to the 
substantial exclusion of hybridization of the 
oligonucleotide with single-stranded nucleic acids of 
non-complementary sequence. 

10 The present invention also includes active 

portions, fragments, derivatives and functional mimetics 
of the MED1 polypeptide or protein of the invention. 

An w active portion" of MED1 polypeptide means a 
peptide which is less than said full length MED1 

15 polypeptide, but which retains its essential biological 
activity, e.g., methyl-CpG DNA binding and/or 
endonuclease activity. 

A "fragment" of the MED1 polypeptide means a stretch 
of amino acid residues of at least about five to seven 

20 contiguous amino acids, often at least about seven to 
nine contiguous amino acids, typically at least about 
nine to thirteen contigous amino acids and, most 
preferably, at least about twenty to thirty or more 
contiguous amino acids. Fragments of the MED1 

25 polypeptide sequence, antigenic determinants or epitopes 
are useful for raising antibodies to a portion of the 
MED1 amino acid sequence. 

A "derivative" of the MED1 polypeptide or a fragment 
thereof means a polypeptide modified by varying the 

3 0 amino acid sequence of the protein, e.g. by manipulation 
of the nucleic acid encoding the protein or by altering 
the protein itself. Such derivatives of the natural 
amino acid sequence may involve insertion, addition, 
deletion or substitution of one or more amino acids, 

35 without fundamentally altering the essential activity of 
the wildtype MED1 polypeptide. 

"Functional mimetic" means a substance which may not 



WO 99/04626 



PCTYUS98/15828 



contain an active portion of the MED1 amino acid 
sequence, and probably is not a peptide at all, but 
which retains the essential biological activity of 
natural MED1 polypeptide . 
5 The nucleic acids, proteins/polypeptides, peptides 

and antibodies of the present invention may be used to 
advantage as markers for diagnosis and prognosis of 
those at risk for colon and other cancers. The 
molecules may also be useful in the diagnosis and/or 

10 treatment of Fragile X syndrome and other diseases 
characterized by triplet repeat expansion. The MED1 
molecules of the invention may also be used as research 
tools and will facilitate the elucidation of the 
mechanistic action of the novel genetic and protein 

15 interactions involved in the maintenance of DNA 
fidelity. 

Thus, the present invention also provides nucleic 
acid molecules, polypeptides and/ or antibodies as 
mentioned above for use in medical treatment. 

20 Further, the present invention provides use of a 

nucleic acid molecule, polypeptide and/or antibody in 
the preparation of a medicament for treating cancer, in 
particular, colorectal cancer. 

In a further aspect of the present invention, there 

25 is provided a kit for detecting mutations in the MED1 
gene associated with cancer, or a susceptibility to 
cancer, the kit comprising one or more nucleic acid 
probes capable of binding and/ or detecting a mutated 
MED1 nucleic acid. Alternatively, the kit may comprise 

30 one or more antibodies capable of specifically binding 
and/or detecting a mutated MED1 nucleic acid or amino 
acid sequence or a pair of oligonucleotide primers 
having sequences corresponding to, or complementary to 
a portion of the nucleic acid sequence set out in 

35 Sequence I. D. NO. 1 or 5 for use in amplifying a MED1 
nucleic acid sequence or mutant allele thereof. 

In yet another aspect of the invention, transgenic 



10 
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animals are provided which in growth and development are 
useful for elucidating the role of MED1 . Isolation of 
the mouse genomic DNA also facilitates the production of 
MED1 knock-out mice. 
5 Aspects and embodiments of the present invention 

will now be illustrated, by way of example, with 
reference to the accompanying figures. Further aspects 
and embodiments will be apparent to those skilled in the 
art. All documents mentioned in this text are 
10 incorporated herein by reference. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 depicts EGY191 yeast cells cotransf ormed 
with a combination of plasmids as indicated in the 

15 figure along with pSH18-34. The yeast so transformed 
were then selected on uracil-minus, histidine-minus 
tryptophan-minus glucose yeast medium to select for the 
presence of all plasmids. Individual transf ormants were 
replated either onto uracil-minus, histidine-minus, 

20 tryptophan-minus , leucine-minus galactose yeast medium 
to score activation of the LEU2 reporters (left panel) 
or onto uracil-minus, histidine-minus, tryptophan-minus 
galactose yeast medium containing 5-bromo-4-chloro-3- 
indolyl-0-D-galactopyranoside (X-gal) to score 

25 activation of the LacZ reporters (right panel) . Growth 
on leucine-minus plates and blue-color formation on LEX- 
gal plates illustrate the specificity of the interaction 
between f 5/MED1 and hMLHl. All interactions were 
galactose specific. The interaction shown between K- 

30 rev-1 and Kritl represents a positive control. 

Figure 2 depicts a Northern blot showing the 
localization of MED1 mRNA in all tested tissues. A 2.4 
kb transcript is observed and high levels of mRNA 
35 expression is detected in heart, skeletal muscle and 
pancreas. The size of the molecular weight standards is 
indicated in kb. 
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Figure 3 shows an alignment of the cDNA of Sequence 
I.D. No. 1 and its encoded endonuclease protein, 
Sequence I.D. No . 2 . 

5 Figure 4A depicts homology analysis of the deduced 

amino acid sequence of MED1 and several other 
endonucleases involved in DNA recognition and repair. 
Figure 4B depicts homology analysis of the deduced amino 
acid sequence of MED1 and the methyl-CpG binding domain 
10 of the rat protein, MeCP2 . Figure 4C depicts homology 
analysis of the deduced amino acid sequence of MED1 and 
the methyl-CpG binding domain of the human protein, 
PCM1. 

15 Figure 5 is a schematic diagram illustrating the 

domain organization of MED1 protein. The methyl-CpG 
binding domain (MBD) and the endonuclease domain (endo) 
are highlighted. Numbers indicate amino acid position. 
The bar below the schematic diagram indicates the 

20 portion of the protein encoded by the original f5 clone. 

Figure 6 is an autoradiograph showing the results 
of coupled in vitro transcription and translation of the 
MED1 open reading frame. Two polypeptides of 70 and 65 
25 kD are synthesized by pcDNA3 -MED1 constructs. In 
control reactions, lacking the MED1 cDNA, these 
polypeptides are not synthesized. 

Figures 7 A and 7B show a schematic diagram (Fig. 

30 7A) of carboxy- and amino-terminal hemagglutinin-tagged 
(HT) MED1 proteins and a Western Blot (Fig. 7B) showing 
protein expression following transfection of the 
constructs into NIH 3T3 cells. A band of approximately 
72 kD is present in cells transfected with the 

35 carboxyterminally tagged MEDl-HT. This band co-migrates 
with the one present in HT-MED1-M1 transfectants, 
indicating that the first ATG at nucleotide position 142 
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is the initiation codon in vivo. 

Figure 8 is a partial metaphase spread of human 
chromosomes showing the chromosomal localization of MED1 
5 by FISH. Hybridization is detected on chromosome 3q21 
(arrow) . An elongated chromosome 3 is shown in the 
inset. 

Figures 9A and 9B are gels and blots demonstrating 

10 the nuclease activity of the recombinant endonuclease 
domain. Figure 9A is a Coomassie-stained SDS-PAGE 
showing IPTG induction of the bacterially-expressed 18- 
22-kD MEDl endonuclease domain (codons 
455-580) (arrowhead, left panel) . In a parallel SDS-PAGE 

15 nuclease activity gel (containing heat-denatured calf 
thymus DNA) , the IPTG-induced 18-22-kD MEDl endonuclease 
domain is negatively stained with the DNA dye, toluidine 
blue (arrowhead, right panel). P, pellet of 10,000x g 
centrifugation; S, supernatant of 10,000x g 

20 centrifugation. Figure 9B shows endonuclease activity 
of recombinant wild-type MEDl. The entire wild-type 
MEDl and a deletion mutant lacking the endonuclease 
domain (Aendo) were expressed in bacteria, purified by 
nickel-agarose chromatography and stained with Coomassie 

25 following SDS-PAGE (left panel) . Increasing amounts of 
the wild-type and Aendo mutant (22 to 175 ng) were 
incubated with 500 ng of the 3.9 kb supercoiled plasmid 
pCR2 (Invitrogen) at 37°C for 30. Reaction products 
were separated on a 1% agarose gel buffered in Ix TAE 

30 and containing 0.25 Mg/ml ethidium bromide (right 
panel) . Wild-type MEDl, but not Aendo, generated nicked 
and linearized DNA. M, lambda/Hindlll digest size 
standards; I, input plasmid DNA, incubated with reaction 
buffer only. 

35 

Figure 10A is an autoradiograph showing the results 
of a mobility shift assay of 29 3 cell lysates expressing 

13 
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the fusion protein Flag-MEDl/f 5 . Flag-peptide eluates 
from anti-Flag immunoprecipitations of Flag-MEDl/f 5- 
expressing 293 cells demonstrate binding activity when 
incubated with a 32 P-labeled double-stranded 
5 oligonucleotide containing five fully methylated CpG 
sites. A mobility shift assay of recombinant MED1 MBD 
(codons 1-154) with methylated and unmethylated DNA 
probes is shown in Figure 10B. The purified MED1 MBD 
demonstrates binding activity when incubated with a 

10 32 P-labeled double-stranded oligonucleotide containing 
five methylated CpG sites (lane 2). Binding is 
abolished by pre-incubation with a 100-fold excess of 
the cold methylated oligonucleotide (lane 3) , but not of 
the cold unmethylated oligonucleotide (lane 4). No 

15 binding is detected when the unmethylated probe is used 
(lanes 5-8) 

Figures 11A and 11B are autoradiographs showing the 
coimmunoprecipitation of hMSH2 with Flag-MEDl/f 5 . Fig. 

20 11A shows a band reacting with the anti-hMSH2 antibody. 

Comigration with hMSH2 is detected by western blotting 
in anti-FLAG immunoprecipitates from Flag-MEDl/f 5 
transfected cells but not control cells. Fig. 11B is a 
western blot of a parallel gel with the anti-FLAG 

25 antibody confirming expression of the Flag-MEDl/f 5 
construct in transfected 293 cells. 

Co-immunoprecipitation of MEDl and MLH1 from human 
cells is shown in Figure 11C. A band reacting with the 
anti-MLHl antibody and comigrating with MLH1 is detected 

30 by western blotting in ant i -hemagglutinin 
immunoprecipitates from HT-MEDl/CMV5-transf ected HEK-293 
cells and not from CMV5-transf ected control cells (upper 
panel) . Western blotting of a parallel gel with the 
anti-hemagglutinin antibody confirms expression of the 

35 HT-MED1 construct in transfected HEK-293 cells (lower 
panel). Lysis buffers contained 0.5% NP-40 (lanes 1-4), 
0.2% NP-40 (lanes 5-6) or 1% Triton X-100 (lanes 7-8). 



WO 99/04626 



PCIYUS98/15828 



Figure 12 is a schematic diagram depicting a model 
for strand targeting in eukaryotic mismatch repair. 
Recognition of the hemimethylated d(GATC) site by E. 
coli MutH (upper panel) is parallelled by recognition of 
5 the hemimethylated CpG site by human MED1 (lower panel) • 

Figure 13 shows a series of MED1 mutations which 
have been isolated from colon cancer patients. Figures 
13 A and 13B show MED1 sequencing electropherograms (ABI) 

10 of three colon tumor DNAs and a normal control DNA. 

Tumors c220T and C226T harbor an apparently heterozygous 
adenine deletion at the (A) 10 track (codons 310-313) 
with predicted frameshift and stop at codon 317 (Fig. 
13 A) . The same mutation was also found in tumor cl8T. 

15 Tumor c215T harbors an apparently heterozygous adenine 
deletion at the (A) 6 track (codons 280-282) with 
predicted frameshift and stop at codon 302 (Fig. 13B) . 
Figure 13C shows a schematic diagram of the truncated 
products predicted to be encoded by the mutant MED1 

20 alleles in the indicated tumors. 

Figure 14 is a schematic diagram of the genomic 
structure of the human MED1 gene (lambda clone MED1 HGL 
#16) • The position of the eight exons is indicated. 

25 Numbers above the exon boxes refer to exon number; 
numbers below the exon boxes refer to the size of the 
exons in base pairs. Exon 1 and part of the intervening 
intron between exon 1 and exon 2 was cloned by PCR 
(indicated by the hatched line). The start (ATG) and 

30 stop (TAA) codons are marked. E: restriction site for 
the enzyme EcoRI. 

Figure 15 is a blot showing the conservation of the 
MED1 gene ("Zooblot") . A low stringency Southern blot 
35 of genomic DNA from indicated vertebrate species reveals 
bands cross-hybridizing with a human MED1 cDNA probe in 
mammals (panel A) and non-mammalian vertebrates (panel 

15 
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B) . The migration and size (in kilobase pairs) of the 
DNA standards are indicated. 

Figure 16 shows a schematic of the genomic 
5 structure of the mouse MED1 gene (lambda clone MED1 MGL 
#3). The position of seven exons is indicated. Numbers 
above the exon boxes refer to exon number; numbers below 
the exon boxes refer to the size of the exons in base 
pairs. The size and position of the exon 1 are not well 
10 defined (as indicated by the dotted line) . The start 
(ATG) codon is marked. The stop codon is presumably 
located in exon 8 which is not contained in this lambda 
clone. E: restriction site for the enzyme EcoRI; S: 
restriction site for the enzyme Sail. 



15 



20 



Figure 17 shows the nucleotide sequence (SEQ ID NO: 
5) of the mouse cDNA MED1 sequence assembled by 
juxtaposition of seven exons derived from the genomic 
clone MED1 MGL #3. 



Figure 18 shows a comparison of the predicted mouse 
MED1 protein sequence with the human MED1 protein 
sequence. Upper sequence: mouse MED1; lower sequence: 
human MED1. Identical amino acids between the two 
25 sequences are indicated by a line, similar amino acids 
by one (low similarity) or two dots (high similarity). 

Figure 19 shows the intron and exon sequences of 
the mouse genomic clone encoding MED1 . Exon sequences 
30 are shown in upper case; intron sequences are shown in 
lower case. The splice donor (gt) and acceptor (ga) 
sites are in bold. 

Figure 2 0 shows the intron and exon sequences of 
35 the human genomic clone encoding MED1. Exon sequences 
are shown in upper case; intron sequences are shown in 
lower case. The splice donor (gt) and acceptor (ga) 

16 
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sites are in bold. 

DETAILED DESCRIPTION OF THE INVENTION 

Hereditary Non-Polyposis Colorectal Cancer (HNPCC) , 
5 or Lynch Syndrome, is an autosomal dominant disorder 
characterized by early onset colorectal tumors. As 
noted above, tumors from HNPCC patients harbor a 
genome-wide DNA replication/repair defect, the hallmark 
of which is length instability of microsatellite repeat 

10 sequences. Patients affected by HNPCC carry a germline 
mutation in genes involved in DNA mismatch repair, a 
specialized system which handles base-base mismatches, 
short insertions/deletions and recombination-derived 
heteroduplexes (Kolodner, R.D., (1995) Trends in 

15 Biochem. Sci. 20:397-4053; Modrich and Lahue, (1996) 
Annu. Rev. Biochem. 65:101-133). The mismatch repair 
pathway contributes to mutational avoidance and genetic 
stability, thus performing a tumor suppressor function. 
Loss or inactivation of the wild type allele in somatic 

20 cells leads to a dramatic increase of the spontaneous 
mutation rate. This, in turn, results in the 
accumulation of mutations in other tumor suppressor 
genes and oncogenes, ultimately leading to neoplastic 
transformation (Bellacosa et al., (1996) Am. J. of Med. 

25 Genetics 62:353-364) . Similarly to other genes involved 
in tumor suppression, mutations of mismatch repair genes 
can be detected in a subset of sporadic colonic and 
extracolonic cancers which exhibit microsatellite 
instability (Liu et al., 1996, supra). 

30 Any one of five DNA mismatch repair genes (hMSH2, 

hMLHl, hPMS2, hMSH6 and hPMSl) is found to be mutated in 
the germline DNA of HNPCC patients (Liu et al., 1996, 
supra) . These genes encode human homologues of the E. 
coli mismatch repair proteins MutS and MutL, which 

35 belong to the methyl-directed mismatch repair system 
(Kolodner, R.D., 1995, supra). Repair by this system 
involves 10 biochemical activities and is organized in 

17 
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3 sequential steps of initiation, excision and 
resynthesis (Modrich, P., 1991) Ann. Rev. Genet. 25:229- 
253) • During initiation, the mismatch is detected and 
a single-strand cut is made on the newly synthesized DNA 
5 strand which contains the mutation. Then, single-strand 
exonucleases (exo I, exo VII, RecJ) excise a span of 
about 1-2 kbp containing the mismatch and finally 
resynthesis by DNA polymerase III takes place. The 
products of the mutSLH genes mediate the initiation 

10 step. MutS detects and binds to the mismatch. Through 
an interaction with MutL, which likely functions as an 
interface with MutS, the single-strand endonuclease MutH 
is activated and cuts the DNA strand carrying the 
mutation (Modrich, P., 1991, supra). 

15 A similar biochemical pathway has been identified 

in eukaryotic cells, and it is also characterized by 
strand-specificity and bidirectional excision capability 
(Fang and Modrich, (1993) J. Biol. Chem. 268:11838- 
11844) . In the bacterial system, MutH has the pivotal 

20 role of identifying the newly synthesized strand, i.e. 
the strand carrying the mutation. Without this function 
there would be a 50% chance of initiating repair on the 
parental strand, thereby stabilizing the mutation. MutH 
identifies and cleaves the new strand by virtue of its 

25 transient lack of adenine methylation at d(GATC) sites 
(Modrich, P., 1991, supra). Despite its crucial 
function, homologues of MutH, i.e. eukaryotic mismatch 
repair endonucleases, have not been identified to date. 
Furthermore, the molecular determinants of strand 

30 discrimination in eukaryotic cells - which lack d(GATC) 
methylation - are not presently known (Kolodner, R.D., 
1995, supra; Modrich and Lahue, 1996, supra). In order 
to gain insight into the mechanisms of strand 
recognition, it is essential to identify the eukaryotic 

35 functional homologue of the MutH endonuclease. Due to 
its proposed central role in mismatch repair, 
inactivation of this enzyme could be responsible for at 
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least some cases of HNPCC. 

As mentioned previously, aberrant DNA methylation 
may also play a role in Fragile X Syndrome. After semi- 
conservative replication of DNA, the mismatch repair 
5 system is able to use the conserved strand as a template 
to correct mismatches resulting from replication errors 
which are by definition in the newly synthesized strand. 
DNA replication results in a transient state of 
hemimethylation in which methylation occurs only on the 

10 template strand. In Fragile X Syndrome, the CGG repeats 
and subsequent expansion of these repeats may be 
triggered by undermethylation leading to misdirection of 
DNA mismatch repair. MED1 encoded proteins may play a 
pivotal role in this aberrant DNA replication/repair 

15 event. As mentioned earlier, this could also be the 
case for other diseases associated with repeat 
expans ion , such as myotonic dystrophy , Huntington ' s 
disease, spino-cerebellar ataxias and Kennedy's disease. 
The genomic and cDNA cloning of MED1, the DNA 

20 molecule of the invention, which encodes a protein 
bearing homology to bacterial endonucleases is described 
in detail below. Analysis of the predicted amino acid 
sequence of the MED1 protein suggests a putative 
mechanism of strand recognition based on cytosine 

25 methylation at CpG sites. Like other DNA recognition 
and repair genes which are mutated in HNPCC as well as 
in sporadic cancers with microsatellite instability, 
MED1 is a candidate nucleic acid for cancer genetic 
testing, both in HNPCC families and in sporadic cancers 

30 with microsatellite instability. Aberrant MED1 activity 
may also be associated with Fragile X Syndrome and other 
diseases characterized by triplet repeat expansion. 
I. Preparation of MEDl-Encoding Nucleic Acid 
Molecules, MED1 Proteins, and Antibodies Thereto 

35 a* Nucleic Acid Molecules 

Nucleic acid molecules encoding the MED1 
endonuclease of the invention may be prepared by two 
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general methods: (1) Synthesis from appropriate 
nucleotide triphosphates, or (2) Isolation from 
biological sources. Both methods utilize protocols well 
known in the art. 
5 The availability of nucleotide sequence 

information, such as the nearly full length cDNA having 
Sequence I.D. No. 1, enables preparation of an isolated 
nucleic acid molecule of the invention by 
oligonucleotide synthesis . Synthetic oligonucleotides 

10 may be prepared by the phosphoramidite method employed 
in the Applied Biosystems 38A DNA Synthesizer or similar 
devices. The resultant construct may be purified 
according to methods known in the art, such as high 
performance liquid chromatography (HPLC) . Long, double- 

15 stranded polynucleotides, such as a DNA molecule of the 
present invention, must be synthesized in stages, due to 
the size limitations inherent in current oligonucleotide 
synthetic methods. Thus, for example, a 2.4 kb double- 
stranded molecule may be synthesized as several smaller 

20 segments of appropriate complementarity. Complementary 
segments thus produced may be annealed such that each 
segment possesses appropriate cohesive termini for 
attachment of an adjacent segment. Adjacent segments 
may be ligated by annealing cohesive termini in the 

25 presence of DNA ligase to construct an entire 2.4 kb 
double-stranded molecule. A synthetic DNA molecule so 
constructed may then be cloned and amplified in an 
appropriate vector. 

Nucleic acid sequences encoding MED1 may be 

30 isolated from appropriate biological sources using 
methods known in the art. In a preferred embodiment, a 
cDNA clone is isolated from a cDNA expression library of 
human origin. In an alternative embodiment, utilizing 
the sequence information provided by the cDNA sequence, 

35 genomic clones encoding MED1 may be isolated. 

Alternatively, cDNA or genomic clones having homology 
with MED1 may be isolated from other species, such as 
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mouse, using oligonucleotide probes corresponding to 
predetermined sequences within the MED1 gene. 

In accordance with the present invention, nucleic 
acids having the appropriate level of sequence homology 
5 with the protein coding region of Sequence I.D. No. 1 
may be identified by using hybridization and washing 
conditions of appropriate stringency. For example, 
hybridizations may be performed, according to the method 
of Sambrook et al. , (supra) using a hybridization 

10 solution comprising: 5X SSC, 5X Denhardt's reagent, 
0.5-1.0% SDS, 100 /ig/ml denatured, fragmented salmon 
sperm DNA, 0.05% sodium pyrophosphate and up to 50% 
formamide. Hybridization is carried out at 37-42 °C for 
at least six hours. Following hybridization, filters 

15 are washed as follows: (1) 5 minutes at room temperature 
in 2X SSC and 0.5-1% SDS; (2) 15 minutes at room 
temperature in 2X SSC and 0.1% SDS; (3) 3 0 minutes-1 
hour at 37 °C in IX SSC and 1% SDS; (4) 2 hours at 42- 
65° in IX SSC and 1% SDS , changing the solution every 30 

20 minutes. 

One common formula for calculating the stringency 
conditions required to achieve hybridization between 
nucleic acid molecules of a specified sequence homology 
is (Sambrook et al., 1989): 

25 T m = 81.5°C + 16.6Log [Na+] + 0.41<% G+C) - 0.63 (% formamide) - 

600/#bp in duplex 

As an illustration of the above formula, using 
[Na+] = [0.368] and 50% formamide, with GC content of 
42% and an average probe size of 200 bases, the T m is 

30 57 °C. The T m of a DNA duplex decreases by 1 - 1.5°C with 
every 1% decrease in homology. Thus, targets with 
greater than about 75% sequence identity would be 
observed using a hybridization temperature of 42 °C. 
Such a sequence would be considered substantially 

35 homologous to the nucleic acid sequence of the present 
invention. 

Nucleic acids of the present invention may be 
maintained as DNA in any convenient cloning vector. In 
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a preferred embodiment, clones are maintained in a 
plasmid cloning/expression vector, such as pBluescript 
(Stratagene, La Jolla, CA) , which is propagated in a 
suitable E, coli host cell. Genomic clones of the 
5 invention encoding the human or mouse MED1 gene may be 
maintained in lambda phage FIX II (Stratagene) . 

MEDl-encoding nucleic acid molecules of the 
invention include cDNA p genomic DNA, RNA, and fragments 
thereof which may be single- or double-stranded. Thus, 

10 this invention provides oligonucleotides (sense or 
antisense strands of DNA or RNA) having sequences 
capable of hybridizing with at least one sequence of a 
nucleic acid molecule of the present invention, such as 
selected segments of the cDNA having Sequence I.D. No. 

15 1. Such oligonucleotides are useful as probes for 
detecting or isolating MEDl genes. 

It will be appreciated by persons skilled in the 
art that variants (e.g., allelic variants) of these 
sequences exist in the human population, and must be 

20 taken into account when designing and/or utilizing 
oligos of the invention. Accordingly, it is within the 
scope of the present invention to encompass such 
variants, with respect to the MEDl sequences disclosed 
herein or the oligos targeted to specific locations on 

25 the respective genes or RNA transcripts. With respect 
to the inclusion of such variants, the term "natural 
allelic variants" is used herein to refer to various 
specific nucleotide sequences and variants thereof that 
would occur in a human population. Genetic 

30 polymorphisms giving rise to conservative or neutral 
amino acid substitutions in the encoded protein are 
examples of such variants. Additionally, the term 
"substantially complementary" refers to oligo sequences 
that may not be perfectly matched to a target sequence, 

35 but the mismatches do not materially affect the ability 
of the oligo to hybridize with its target sequence under 
the conditions described. 
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Thus, the coding sequence may be that shown in 
Sequence I.D. No. 1, or it may be a mutant, variant, 
derivative or allele of this sequence. The sequence may 
differ from that shown by a change which is one or more 
5 of addition, insertion, deletion and subsitution of one 
or more nucleotides of the sequence shown. Changes to a 
nucleotide sequence may result in an amino acid change 
at the protein level, or not, as determined by the 
genetic code. 

10 Thus, nucleic acid according to the present 

invention may include a sequence different from the 
sequence shown in Sequence I.D. No. 1 yet encode a 
polypeptide with the same amino acid sequence. 

On the other hand, the encoded polypeptide may 

15 comprise an amino acid sequence which differs by one or 
more amino acid residues from the amino acid sequence 
shown in Sequence I.D. No. 2. Nucleic acid encoding a 
polypeptide which is an amino acid sequence mutant, 
variant, derivative or allele of the sequence shown in 

20 Sequence I.D. No. 2 is further provided by the present 
invention. Nucleic acid encoding such a polypeptide may 
show greater than 60% homology with the coding sequence 
shown in Sequence I.D. No. 1, greater than about 7 0% 
homology, greater than about 80% homology, greater than 

25 about 90% homology or greater than about 95% homology. 

Also within the scope of the invention are 
antisense oligonucleotide sequences based on the MED1 
nucleic acid sequences described herein. Antisense 
oligonucleotides may be designed to hybridize to the 

30 complementary sequence of nucleic acid, pre-mRNA or 
mature mRNA, interfering with the production of 
polypeptides encoded by a given DNA sequence (e.g. 
either native MED1 polypeptide or a mutant form 
thereof) , so that its expression is reduced or prevented 

35 altogether. In addition to the MED1 coding sequence, 
antisense techniques can be used to target control 
sequences of the MEDl gene, e.g. in the 5' flanking 
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sequence of the MED1 coding sequence, whereby the 
antisense oligonucleotides can interfere with MED1 
control sequences. The construction of antisense 
sequences and their use is described in Peyman and 
5 Ulman, Chemical Reviews, 90:543-584, (1990), Crooke, 
Ann. Rev. Pharmacol. Toxical., 32:329-376, (1992), and 
Zamecnik and Stephenson, Proc. Natl. Acad. Sci., 75:280- 
284, (1974). 

The present invention provides a method of 

10 obtaining nucleic acid of interest, the method including 
hybridization of a probe having part or all of the 
sequence shown in Sequence I.D. No. 1 or a complementary 
sequence, to target nucleic acid. Hybridization is 
generally followed by identification of successful 

15 hybridization and isolation of nucleic acid which has 
hybridized to the probe, which may involve one or more 
steps of PCR. 

Such oligonucleotide probes or primers, as well as 
the full-length sequence (and mutants, alleles, 

20 variants, and derivatives) are useful in screening a 
test sample containing nucleic acid for the presence of 
alleles, mutants or variants, especially those that 
confer susceptibility or predisposition to cancers, the 
probes hybridizing with a target sequence from a sample 

25 obtained from the individual being tested. The 
conditions of the hybridization can be controlled to 
minimize non-specific binding, and preferably stringent 
to moderately stringent hybridization conditions are 
used. The skilled person is readily able to design such 

30 probes, label them and devise suitable conditions for 
hybridization reactions, assisted by textbooks such as 
Sambrook et al (1989) and Ausubel et al (1992). 

In some preferred embodiments, oligonucleotides 
according to the present invention that are fragments of 

35 the sequences shown in Sequence I.D. No. 1 or Sequence 
I.D. No. 5, or any allele associated with cancer 
susceptibility, are at least about 10 nucleotides in 
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length, more preferably at least 15 nucleotides in 
length, more preferably at least about 20 nucleotides in 
length. Such fragments themselves individually represent 
aspects of the present invention. Fragments and other 
5 oligonucleotides may be used as primers or probes as 
discussed but may also be generated (e.g. by PCR) in 
methods concerned with determining the presence in a 
test sample of a sequence indicative of cancer 
susceptibility . 

10 Methods involving use of nucleic acid in diagnostic 

and/or prognostic contexts, for instance in determining 
susceptibility to cancer, and other methods concerned 
with determining the presence of sequences indicative of 
cancer susceptibility are discussed below. 

15 Nucleic acid according to the present invention may 

be used in methods of gene therapy, for instance in 
treatment of individuals with the aim of preventing or 
curing (wholly or partially) cancer. This too is 
discussed below. 

20 B. Proteins 

MED1 protein demonstrates methyl-CpG DNA 
binding and endonuclease activity. A full-length MED1 
protein of the present invention may be prepared in a 
variety of ways, according to known methods. The 

25 protein may be purified from appropriate sources, e.g., 
transformed bacterial or animal cultured cells or 
tissues, by immunoaf f inity purification. However, this 
is not a preferred method due to the low amount of 
protein likely to be present in a given cell type at any 

30 time. The availability of nucleic acid molecules 
encoding MED1 enables production of the protein using in 
vitro expression methods known in the art. For example, 
a cDNA or gene may be cloned into an appropriate in 
vitro transcription vector, such as pSP64 or pSP65 for 

35 in vitro transcription, followed by cell-free 
translation in a suitable cell-free translation system, 
such as wheat germ or rabbit reticulocyte lysates. In 
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vitro transcription and translation systems are 
commercially available, e.g., from Promega Biotech, 
Madison, Wisconsin or BRL, Rockville, Maryland. 

Alternatively, according to a preferred 
5 embodiment, larger quantities of MED1 may be produced by 
expression in a suitable prokaryotic or eukaryotic 
system. For example, part or all of a DNA molecule, 
such as the cDNA having Sequence I.D. No. 1, may be 
inserted into a plasmid vector adapted for expression in 

10 a bacterial cell, such as E. coll. Such vectors 
comprise the regulatory elements necessary for 
expression of the DNA in the host cell (e.g. E. coll) 
positioned in such a manner as to permit expression of 
the DNA in the host cell. Such regulatory elements 

15 required for expression include promoter sequences, 
transcription initiation sequences and, optionally, 
enhancer sequences. 

The MED1 produced by gene expression in a 
recombinant prokaryotic or eukaryotic system may be 

20 purified according to methods known in the art. In a 
preferred embodiment, a commercially available 
expression/ secretion system can be used, whereby the 
recombinant protein is expressed and thereafter secreted 
from the host cell, to be easily purified from the 

25 surrounding medium. If expression/ secretion vectors are 
not used, an alternative approach involves purifying the 
recombinant protein by affinity separation, such as by 
immunological interaction with antibodies that bind 
specifically to the recombinant protein or nickel 

30 columns for isolation of recombinant proteins tagged 
with 6-8 histidine residues at their N-terminus or C- 
terminus. Alternative tags may comprise the FLAG 
epitope or the hemagglutinin epitope. Such methods are 
commonly used by skilled practitioners. 

35 The MED1 proteins of the invention, prepared 

by the aforementioned methods, may be analyzed according 
to standard procedures. For example, such proteins may 
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be subjected to amino acid sequence analysis, according 
to known methods. 

As discussed above, a convenient way of producing 
a polypeptide according to the present invention is to 
5 express nucleic acid encoding it, by use of the nucleic 
acid in an expression system. The use of expression 
systems has reached an advanced degree of sophistication 
today . 

Accordingly, the present invention also encompasses 

10 a method of making a polypeptide (as disclosed) , the 
method including expression from nucleic acid encoding 
the polypeptide (generally nucleic acid according to the 
invention) . This may conveniently be achieved by growing 
a host cell in culture, containing such a vector, under 

15 appropriate conditions which cause or allow production 
of the polypeptide. Polypeptides may also be produced in 
in vitro systems, such as reticulocyte lysate. 

Polypeptides which are amino acid sequence 
variants, alleles, derivatives or mutants are also 

2 0 provided by the present invention. A polypeptide which 
is a variant, allele, derivative, or mutant may have an 
amino acid sequence that differs from that given in 
Sequence I.D. No. 2 by one or more of addition, 
substitution, deletion and insertion of one or more 

25 amino acids. Preferred such polypeptides have MED1 
function, that is to say have one or more of the 
following properties: methyl-CpG DNA binding activity; 
endonuclease activity; immunological cross-reactivity 
with an antibody reactive with the polypeptide for which 

30 the sequence is given in Sequence I.D. No. 2; sharing an 
epitope with the polypeptide for which the sequence is 
given in Sequence I.D. No. 2 (as determined for example 
by immunological cross-reactivity between the two 
polypeptides . 

35 A polypeptide which is an amino acid sequence 

variant, allele, derivative or mutant of the amino acid 
sequence shown in Sequence I.D. No. 2 may comprise an 
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amino acid sequence which shares greater than about 35% 
sequence identity with the sequence shown, greater than 
about 40%, greater than about 50%, greater than about 
60%, greater than about 70%, greater than about 80%, 
5 greater than about 90% or greater than about 95%. 
Particular amino acid sequence variants may differ from 
that shown in Sequence I.D. No. 2 by insertion, addition, 
substition or deletion of 1 amino acid, 2, 3, 4, 5-10, 
10-20, 20-30, 30-40, 40-50, 50-100, 100-150, or more 

10 than 150 amino acids. 

A polypeptide according to the present 
invention may be used in screening for molecules which 
affect or modulate its activity or function. Such 
molecules may be useful in a therapeutic (possibly 

15 including prophylactic) context. 

The present invention also provides antibodies 
capable of immunospecif ically binding to proteins of the 
invention. Polyclonal antibodies directed toward MEDl 
may be prepared according to standard methods. In a 

20 preferred embodiment, monoclonal antibodies are 
prepared, which react immunospecif ically with various 
epitopes of MEDl. Monoclonal antibodies may be prepared 
according to general methods of Kohler and Milstein, 
following standard protocols. Polyclonal or monoclonal 

25 antibodies that immunospecif ically interact with MEDl 
can be utilized for identifying and purifying such 
proteins. For example, antibodies may be utilized for 
affinity separation of proteins with which they 
immunospecif ically interact. Antibodies may also be 

30 used to immunoprecipitate proteins from a sample 
containing a mixture of proteins and other biological 
molecules. Other uses of anti-MEDl antibodies are 
described below. 

Antibodies according to the present invention may 

35 be modified in a number of ways. Indeed the term 
"antibody" should be construed as covering any binding 
substance having a binding domain with the required 
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specificity. Thus, the invention covers antibody 
fragments, derivatives, functional equivalents and 
homologues of antibodies, including synthetic molecules 
and molecules whose shape mimics that of an antibody 
5 enabling it to bind an antigen or epitope. 

Exemplary antibody fragments, capable of binding an 
antigen or other binding partner, are Fab fragment 
consisting of the VL, VH, CI and CHI domains; the Fd 
fragment consisting of the VH and CHI domains; the Fv 

10 fragment consisting of the VL and VH domains of a single 
arm of an antibody; the dAb fragment which consists of 
a VH domain; isolated CDR regions and F(ab')2 fragments, 
a bivalent fragment including two Fab fragments linked 
by a disulphide bridge at the hinge region. Single chain 

15 Fv fragments are also included. 

Humanized antibodies in which CDRs from a non-human 
source are grafted onto human framework regions, 
typically with alteration of some of the framework amino 
acid residues, to provide antibodies which are less 

20 immunogenic than the parent non-human antibodies, are 
also included within the present invention. 

II. Uses of MEDl-Encoding Nucleic Acids, 
MED1 Proteins and Antibodies Thereto 

25 MED1 appears to be an important DNA repair 

endonuclease which may play a role in mismatch repair. 
Mutations in MED1 are associated with certain forms of 
colon and endometrial cancer. The MED1 molecules of the 
invention may be used to advantage in genetic screening 

30 assays to identify those patients that may be at risk. 

Screening assays may also be developed which assess 
aberrant MED1 activity associated with Fragile X 
syndrome and other diseases characterized by triplet 
repeat expansion. Due to its methyl-CpG binding domain, 

35 MED1 might be useful in the analysis of genome 
methylation and of methylation-mediated DNA 
transcription, replication and repair (for instance, by 
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cleaving methylated and non-methylated DNA in a 
differential manner) . Due to its endonuclease activity, 
MED1 is expected to be useful in the context of DNA 
manipulation technology. The employment of MED1 would 
5 be of particular interest in the area of mutation 
detection. Other endonucleases have been successfully 
used to detect mutations based on recognition of 
cleavage products of heteroduplex intermediates carrying 
mismatches (Mashal R.D., Koontz J. and Sklaar J. Nature 

10 Genet. 9: 177-183, 1995; Smith J. and Modrich P. Proc. 
Natl. Acad. Sci USA 93: 4374-4379, 1996). 

Additionally, MED1 nucleic acids, proteins and 
antibodies thereto, according to this invention, may be 
used as a research tool to identify other proteins that 

15 are intimately involved in DNA recognition and repair 
reactions. Biochemical elucidation of the DNA 

recognition and repair capacity of MEDl will facilitate 
the development of these novel screening assays for 
assessing a patient's propensity for cancer and genetic 

20 disease. 

A. MEDl-Encoding Nucleic Acids 

MEDl-encoding nucleic acids may be used for a 
variety of purposes in accordance with the present 

25 invention. MEDl-encoding DNA, RNA, or fragments thereof 
may be used as probes to detect the presence of and/or 
expression of genes encoding MEDl proteins. Methods in 
which MEDl-encoding nucleic acids may be utilized as 
probes for such assays include, but are not limited to: 

30 (l) in situ hybridization; (2) Southern hybridization 
(3) northern hybridization; and (4) assorted 
amplification reactions such as polymerase chain 
reactions (PCR) . 

The MEDl-encoding nucleic acids of the 

35 invention may also be utilized as probes to identify 
related genes from other animal species. As is well 
known in the art, hybridization stringencies may be 
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adjusted to allow hybridization of nucleic acid probes 
with complementary sequences of varying degrees of 
homology. Thus, MEDl-encoding nucleic acids may be used 
to advantage to identify and characterize other genes of 
5 varying degrees of relation to MED1, thereby enabling 
further characterization of the DNA repair system. 
Additionally, they may be used to identify genes 
encoding proteins that interact with MED1 (e.g., by the 
"interaction trap" technique) , which should further 
10 accelerate identification of the components involved in 
DNA repair. 

Nucleic acid molecules, or fragments thereof, 
encoding MED1 may also be utilized to control the 
production of MED1, thereby regulating the amount of 

15 protein available to participate in DNA repair 
reactions. Alterations in the physiological amount of 
MED1 protein may dramatically affect the activity of 
other protein factors involved in DNA repair. 

The availability of MED1 encoding nucleic acids 

2 0 enables the production of strains of laboratory mice 
carrying part or all of the MED1 gene or mutated 
sequences thereof. Such mice may provide an in vivo 
model for cancer. Alternatively, the MED1 sequence 
information provided herein enables the production of 

25 knockout mice in which the endogenous gene encoding MED1 
has been specifically inactivated. Methods of 

introducing transgenes in laboratory mice are known to 
those of skill in the art. Three common methods 
include: 1. integration of retroviral vectors encoding 

30 the foreign gene of interest into an early embryo; 2. 

injection of DNA into the pronucleus of a newly 
fertilized egg; and 3. the incorporation of genetically 
manipulated embryonic stem cells into an early embryo. 
Production of the transgenic mice described above will 

35 faciliate the molecular elucidation of the role MED1 
plays in embryonic development and cancer. 

A transgenic mouse carrying the human MED1 gene is 
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generated by direct replacement of the mouse MED1 gene 
with the human gene. These transgenic animals are 
useful for drug screening studies as animal models for 
human diseases and for eventual treatment of disorders 
5 or diseases associated with biological activities 
modulated by MED1. A transgenic animal carrying a 
"knock out" of MED1 is useful for assessing the role of 
MED1 in maintaining DNA fidelity. 

As a means to define the role that MED1 plays in 

10 mammalian systems, mice may be generated that cannot 
make MED1 protein because of a targeted mutational 
disruption of the MED1 gene. 

The term "animal" is used herein to include all 
vertebrate animals, except humans. It also includes an 

15 individual animal in all stages of development, 
including embryonic and fetal stages. A "transgenic 
animal" is any animal containing one or more cells 
bearing genetic information altered or received, 
directly or indirectly, by deliberate genetic 

20 manipulation at the subcellular level, such as by 
targeted recombination or microinjection or infection 
with recombinant virus. The term "transgenic animal" is 
not meant to encompass classical cross-breeding or in 
vitro fertilization, but rather is meant to encompass 

25 animals in which one or more cells are altered by or 
receive a recombinant DNA molecule. This molecule may 
be specifically targeted to a defined genetic locus, be 
randomly integrated within a chromosome, or it may be 
extrachromosomally replicating DNA. The term "germ cell 

30 line transgenic animal" refers to a transgenic animal in 
which the genetic alteration or genetic information was 
introduced into a germ line cell, thereby conferring the 
ability to transfer the genetic information to 
offspring. If such offspring, in fact, possess some or 

35 all of that alteration or genetic information, then 
they, too, are transgenic animals. 

The alteration or genetic information may be 
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foreign to the species of animal to which the recipient 
belongs, or foreign only to the particular individual 
recipient, or may be genetic information already 
possessed by the recipient. In the last case, the 
5 altered or introduced gene may be expressed differently 
than the native gene. 

The altered MED1 gene generally should not fully 
encode the same MED1 protein native to the host animal 
and its expression product should be altered to a minor 

10 or great degree, or absent altogether. However, it is 
conceivable that a more modestly modified MED1 gene will 
fall within the compass of the present invention if it 
is a specific alteration. 

The DNA used for altering a target gene may be 

15 obtained by a wide variety of techniques that include, 
but are not limited to, isolation from genomic sources, 
preparation of cDNAs from isolated mRNA templates, 
direct synthesis, or a combination thereof. 

A type of target cell for transgene introduction is 

20 the embryonal stem cell (ES) . ES cells may be obtained 
from pre-implantation embryos cultured in vitro (Evans 
et al., (1981) Nature 292:154-156; Bradley et al., 
(1984) Nature 309:255-258; Gossler et al., (1986) Proc. 
Natl. Acad. Sci. 83:9065-9069). Transgenes can be 

25 efficiently introduced into the ES cells by standard 
techniques such as DNA transfection or by retrovirus- 
mediated transduction. The resultant transformed ES 
cells can thereafter be combined with blastocysts from 
a non-human animal. The introduced ES cells thereafter 

30 colonize the embryo and contribute to the germ line of 
the resulting chimeric animal. 

One approach to the problem of determining the 
contributions of individual genes and their expression 
products is to use isolated MED1 genes to selectively 

35 inactivate the wild-type gene in totipotent ES cells 
(such as those described above) and then generate 
transgenic mice. The use of gene-targeted ES cells in 
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the generation of gene-targeted transgenic mice was 
described, and is reviewed elsewhere (Frohman et al., 
(1989) Cell 56:145-147; Bradley et al., (1992) 
Bio/Technology 10:534-539). 
5 Techniques are available to inactivate or alter any 

genetic region to a mutation desired by using targeted 
homologous recombination to insert specific changes into 
chromosomal alleles. However, in comparison with 
homologous extrachromosomal recombination, which occurs 

10 at a frequency approaching 100%, homologous plasmid- 
chromosome recombination was originally reported to only 
be detected at frequencies between 10" 6 and 10" 3 . 
Nonhomologous plasmid-chromosome interactions are more 
frequent occurring at levels 10 5 -fold to 10 2 -fold greater 

15 than comparable homologous insertion. 

To overcome this low proportion of targeted 
recombination in murine ES cells, various strategies 
have been developed to detect or select rare homologous 
recombinants. One approach for detecting homologous 

20 alteration events uses the polymerase chain reaction 
(PCR) to screen pools of transformant cells for 
homologous insertion, followed by screening of 
individual clones. Alternatively, a positive genetic 
selection approach has been developed in which a marker 

25 gene is constructed which will only be active if 
homologous insertion occurs, allowing these recombinants 
to be selected directly. One of the most powerful 
approaches developed for selecting homologous 
recombinants is the positive-negative selection (PNS) 

30 method developed for genes for which no direct selection 
of the alteration exists. The PNS method is more 
efficient for targeting genes which are not expressed at 
high levels because the marker gene has its own 
promoter. Non-homologous recombinants are selected 

35 against by using the Herpes Simplex virus thymidine 
kinase (HSV-TK) gene and selecting against its 
nonhomologous insertion with effective herpes drugs such 
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as gancyclovir (GANC) or (1- (2-deoxy-2-f luoro-B-D 
arabinof luranosyl) -5-iodouracil , (FIAU) . By this 

counter selection, the number of homologous recombinants 
in the surviving transf ormants can be increased* 

As used herein, a "targeted gene" or "knock-out" is 
a DNA sequence introduced into the germline or a non- 
human animal by way of human intervention, including but 
not limited to, the methods described herein. The 
targeted genes of the invention include DNA sequences 
which are designed to specifically alter cognate 
endogenous alleles. 

Methods of use for the transgenic mice of the 
invention are also provided herein. Therapeutic agents 
for the treatment or prevention of cancer may be 
screened in studies using MED1 transgenic mice. 

In another embodiment of the invention, MED1 
knockout mice may be used to produce an array of 
monoclonal antibodies specific for MED1 protein. 

As described above, MEDl-encoding nucleic acids are 
also used to advantage to produce large quantities of 
substantially pure MED1 protein, or selected portions 
thereof . 



B. MEDl Protein and Antibodies 

Purified MEDl, or fragments thereof, may be 
used to produce polyclonal or monoclonal antibodies 
which also may serve as sensitive detection reagents for 
the presence and accumulation of MEDl (or complexes 
containing MEDl) in mammalian cells. Recombinant 
techniques enable expression of fusion proteins 
containing part or all of the MEDl protein. The full 
length protein or fragments of the protein may be used 
to advantage to generate an array of monoclonal 
antibodies specific for various epitopes of the protein, 
thereby providing even greater sensitivity for detection 
of the protein in cells. 

Polyclonal or monoclonal antibodies 
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immunologically specific for MED1 may be used in a 
variety of assays designed to detect and quantitate the 
protein. Such assays include, but are not limited to: 
(1) flow cytometric analysis; (2) immunochemical 
5 localization of MED1 in tumor cells; and (3) immunoblot 
analysis (e.g., dot blot, Western blot) of extracts from 
various cells. Additionally, as described above, anti- 
MED1 can be used for purification of MED1 (e.g., 
affinity column purification, immunoprecipitation) . 

10 From the foregoing discussion, it can be seen 

that MEDl-encoding nucleic acids, MED1 expressing 
vectors, MED1 proteins and anti-MEDl antibodies of the 
invention can be used to detect MED1 gene expression and 
alter MED1 protein accumulation for purposes of 

15 assessing the genetic and protein interactions involved 
in the recognition and repair of DNA damage. 

Exemplary approaches for detecting MED1 nucleic 
acid or polypeptides/proteins include: 

a) comparing the sequence of nucleic acid in the 
20 sample with the MED1 nucleic acid sequence to determine 

whether the sample from the patient contains 
mutations; or 

b) determining the presence, in a sample from a 
patient, of the polypeptide encoded by the MED1 gene 

25 and, if present, determining whether the polypeptide is 
full length, and/or is mutated, and/or is expressed at 
the normal level; or 

c) using DNA restriction mapping to compare the 
restriction pattern produced when a restriction enzyme' 

30 cuts a sample of nucleic acid from the patient with the 
restriction pattern obtained from normal MED1 gene or 
from known mutations thereof; or, 

d) using a specific binding member capable of 
binding to a MED1 nucleic acid sequence (either normal 

35 sequence or known mutated sequence) , the specific 
binding member comprising nucleic acid hybridizable with 
the MED1 sequence, or substances comprising an antibody 



36 



WO 99/04626 PCTYUS98/15828 

domain with specificity for a native or mutated MED1 
nucleic acid sequence or the polypeptide encoded by it, 
the specific binding member being labelled so that 
binding of the specific binding member to its binding 
5 partner is detectable; or, 

e) using PCR involving one or more primers based on 
normal or mutated MED1 gene sequence to screen for 
normal or mutant MED1 gene in a sample from a patient. 
A "specific binding pair" comprises a specific 

10 binding member (sbm) and a binding partner (bp) which 
have a particular specificity for each other and which 
in normal conditions bind to each other in preference to 
other molecules. Examples of specific binding pairs are 
antigens and antibodies, ligands and receptors and 

15 complementary nucleotide sequences. The skilled person 
is aware of many other examples and they do not need to 
be listed here. Further, the term "specific binding pair" 
is also applicable where either or both of the specific 
binding member and the binding partner comprise a part 

20 of a large molecule. In embodiments in which the 
specific binding pair are nucleic acid sequences, they 
will be of a length to hybridize to each other under 
conditions of the assay, preferably greater than 10 
nucleotides long, more preferably greater than 15 or 20 

25 nucleotides long. 

In most embodiments for screening for cancer 
susceptibility alleles, the MED1 nucleic acid in the 
sample will initially be amplified, e.g. using PCR, to 
increase the amount of the analyte as compared to other 

30 sequences present in the sample. This allows the target 
sequences to be detected with a high degree of 
sensitivity if they are present in the sample. This 
initial step may be avoided by using highly sensitive 
array techniques that are becoming increasingly 

35 important in the art. 

The identification of the MED1 gene and its 
association with cancer paves the way for aspects of the 
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present invention to provide the use of materials and 
methods, such as are disclosed and discussed above, for 
establishing the presence or absence in a test sample of 
a variant form of the gene, in particular an allele or 
5 variant specifically associated with cancer, especially 
colorectal or endometrial cancer. This may be for 
diagnosing a predisposition of an individual to cancer. 
It may be for diagnosing cancer of a patient with the 
disease as being associated with the gene. 
10 This allows for planning of appropriate therapeutic 

and/or prophylactic measures, permitting stream-lining 
of treatment. The approach further stream- lines 
treatment by targeting those patients most likely to 
benefit. 

15 According to another aspect of the invention, 

methods of screening drugs for cancer therapy to 
identify suitable drugs for restoring MED1 product 
functions are provided. A major problem in cancer 
treatment is the development of drug resistance or 

20 ionizing radiation resistance by the tumor cells which 
eventually leads to failure of therapy. Recent studies 
have revealed that inactivation of DNA mismatch repair 
is an important mechanism of resistance to many 
chemotherapeutic drugs used in the clinic (Fink D., Aebi 

25 S. and Howell S.B. (1998). Clinical Cancer Res. 4: 1-6). 

In fact, a functional mismatch repair system appears to 
be required for killing by many alkylating agents and 
platinum compounds. Resistance/ tolerance to those 
agents is associated with loss of expression or function 

30 of mismatch repair genes: in the absence of a functional 
mismatch repair system, DNA damage accumulates but fails 
to trigger apoptosis (Fink D., Aebi S. and Howell S.B. 
(1998) , supra) . Defects in DNA mismatch repair genes 
(hMLHl, hPMS2, hMSH2 and hMSH6) have been found in cell 

35 lines and primary tumors resistant to those 
chemotherapeutic agents. Thus, loss of MED1 

function/ express ion may be associated with tumor drug 
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resistance. Restoring of MED1 function by gene transfer 
or by pharmacological means would be expected to 
overcome resistance to treatment. 

The MED1 polypeptide or fragment employed in drug 
5 screening assays may either be free in solution, affixed 
to a solid support or within a cell. One method of drug 
screening utilizes eukaryotic or prokaryotic host cells 
which are stably transformed with recombinant 
polynucleotides expressing the polypeptide or fragment, 

10 preferably in competitive binding assays. Such cells, 
either in viable or fixed form, can be used for standard 
binding assays. One may determine, for example, 
formation of complexes between a MED1 polypeptide or 
fragment and the agent being tested, or examine the 

15 degree to which the formation of a complex between a 
MED1 polypeptide or fragment and a known ligand is 
interfered with by the agent being tested. 

Another technique for drug screening provides high 
throughput screening for compounds having suitable 

20 binding affinity to the MED1 polypeptides and is 
described in detail in Geysen, PCT published application 
WO 84/03564, published on Sep. 13, 1984. Briefly 
stated, large numbers of different, small peptide test 
compounds are synthesized on a solid substrate, such as 

25 plastic pins or some other surface. The peptide test 
compounds are reacted with MED1 polypeptide and washed. 
Bound MED1 polypeptide is then detected by methods well 
known in the art. 

Purified MED1 can be coated directly onto plates 

30 for use in the aforementioned drug screening techniques. 

However, non-neutralizing antibodies to the polypeptide 
can be used to capture antibodies to immobilize the MED1 
polypeptide on the solid phase. 

This invention also contemplates the use of 

35 competitive drug screening assays in which neutralizing 
antibodies capable of specifically binding the MED1 
polypeptide compete with a test compound for binding to 
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the MED1 polypeptide or fragments thereof. In this 
manner, the antibodies can be used to detect the 
presence of any peptide which shares one or more 
antigenic determinants of the MED1 polypeptide. 
5 A further technique for drug screening involves the 

use of host eukaryotic cell lines or cells (such as 
described above) which have a nonfunctional MED1 gene. 
These host cell lines or cells are defective at the MED1 
polypeptide level. The host cell lines or cells are 

10 grown in the presence of drug compound. The rate of 
growth of the host cells is measured to determine if the 
compound is capable of regulating the growth of MED1 
defective cells. 

The goal of rational drug design is to produce 

15 structural analogs of biologically active polypeptides 
of interest or of small molecules with which they 
interact (e.g., agonists, antagonists, inhibitors) in 
order to fashion drugs which are, for example, more 
active or stable forms of the polypeptide, or which, 

20 e.g., enhance or interfere with the function of a 
polypeptide in vivo. See, e.g., Hodgson, (1991) 
Bio/Technology 9:19-21. In one approach, one first 
determines the three-dimensional structure of a protein 
of interest (e.g., MED1 polypeptide) or, for example, of 

25 the MED1-DNA complex, by x-ray crystallography, by 
nuclear magnetic resonance, by computer modeling or most 
typically, by a combination of approaches. Less often, 
useful information regarding the structure of a 
polypeptide may be gained by modeling based on the 

30 structure of homologous proteins. An example of 
rational drug design is the development of HIV protease 
inhibitors (Erickson et al., (1990) Scinece 249:527- 
533). In addition, peptides (e.g., MED1 polypeptide) 
may be analyzed by an alanine scan (Wells, 1991) Meth. 

35 Enzym. 202:390-411. In this technique, an amino acid 
residue is replaced by Ala, and its effect on the 
peptide's activity is determined. Each of the amino acid 
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residues of the peptide is analyzed in this manner to 
determine the important regions of the peptide. 

It is also possible to isolate a target-specific 
antibody, selected by a functional assay, and then to 
5 solve its crystal structure. In principle, this 
approach yields a pharmacore upon which subsequent drug 
design can be based. It is possible to bypass protein 
crystallography altogether by generating anti-idiotypic 
antibodies (anti-ids) to a functional, pharmacologically 

10 active antibody. As a mirror image of a mirror image, 
the binding site of the anti-ids would be expected to be 
an analog of the original molecule. The anti-id could 
then be used to identify and isolate peptides from banks 
of chemically or biologically produced banks of 

15 peptides. Selected peptides would then act as the 
pharmacore. 

Thus, one may. design drugs which have, e.g., 
improved MED1 polypeptide activity or stability or which 
act as inhibitors, agonists, antagonists, etc. of MED1 

20 polypeptide activity. By virtue of the availability of 
cloned MED1 sequences, sufficient amounts of the MED1 
polypeptide may be made available to perform such 
analytical studies as x-ray crystallography. In 
addition, the knowledge of the MED1 protein sequence 

25 provided herein will guide those employing computer 
modeling techniques in place of, or in addition to x-ray 
crystallography . 

Ill Therapeutics 
30 A. Pharmaceuticals and Peptide Therapies 

The MED1 polypeptides/proteins , antibodies , 
peptides and nucleic acids of the invention can be 
formulated in pharmaceutical compositions. These 
compositions may comprise, in addition to one of the 
35 above substances, a pharmaceutcally acceptable 
excipient, carrier, buffer, stabilizer or other 
materials well known to those skilled in the art. Such 
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materials should be non-toxic and should not interfere 
with the efficacy of the active ingredient. The precise 
nature of the carrier or other material may depend on 
the route of administration, e.g. oral, intravenous, 
5 cutaneous or subcutaneous, nasal, intramuscular, 
intraperitoneal routes. 

Whether it is a polypeptide, antibody, peptide, 
nucleic acid molecule, small molecule or other 
pharmaceutical ly useful compound according to the 

10 present invention that is to be given to an individual, 
administration is preferably in a "prophylactically 
effective amount" or a "therapeutically effective amount" 
(as the case may be, although prophylaxis may be 
considered therapy) , this being sufficient to show 

15 benefit to the individual. 

B. Methods of Gene Therapy 

As a further alternative, the nucleic acid encoding 
the authentic biologically active MED1 polypeptide could 

20 be used in a method of gene therapy, to treat a patient 
who is unable to synthesize the active "normal** 
polypeptide or unable to synthesize it at the normal 
level, thereby providing the effect elicited by wild- 
type MED1 and suppressing the occurrence of "abnormal" 

25 MED1 lacking the ability to perform or effect DNA 
repair. 

Vectors such as viral vectors have been used in the 
prior art to introduce genes into a wide variety of 
different target cells. Typically the vectors are 

30 exposed to the target cells so that transformation can 
take place in a sufficient proportion of the cells to 
provide a useful therapeutic or prophylactic effect from 
the expression of the desired polypeptide. The 
transfected nucleic acid may be permanently incorporated 

35 into the genome of each of the targeted tumor cells, 
providing long lasting effect, or alternatively the 
treatment may have to be repeated periodically. 
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A variety of vectors, both viral vectors and 
plasmid vectors are known in the art, see US Patent No. 
5,252,479 and WO 93/07282. In particular, a number of 
viruses have been used as gene transfer vectors, 
5 including papovaviruses , such as SV40, vaccinia virus, 
herpes viruses including HSV and EBV, and retroviruses. 
Many gene therapy protocols in the prior art have 
employed disabled murine retroviruses. 

Gene transfer techniques which selectively target 
10 the MED1 nucleic acid to colorectal tissues are 
preferred. Examples of this include receptor-mediated 
gene transfer, in which the nucleic acid is linked to a 
protein ligand via poly lysine, with the ligand being 
specific for a receptor present on the surface of the 
15 target cells. 

The following examples are provided to illustrate 
certain embodiments of the invention. They are not 
intended to limit the invention in any way. 

EXAMPLE I 

20 The methods described below have been used to 

advantage to isolate the MED1 encoding nucleic acids of 
the invention. 

A. Interaction trap screen, cDNA and genomic DNA 
isolation. 

25 Yeast interaction trap screening (Gyuris et al., 

(1993) Cell 75:791-803; Golemis et al., (1996) Yeast 
Interaction Trap/Two Hybrid Systems to Identify 
Interacting Proteins, Unit 20.1.1-20.1.28 in Current 
Protocols in Molecular Biology, eds. Ausubel, F.M. et 

30 al., John Wiley & Sons, NY) was used to isolate cDNAs 
encoding proteins able to interact with hMLHl. The 
hMLHl open reading frame was inserted into the 
polylinker of the pEG202 vector (Golemis et al., 1996, 
supra) . The resulting "bait" construct pEG202-t-hMLHl 

35 expresses the hMLHl protein (amino acids 1-756) as a 
car boxy terminal fusion to the LexA DNA binding protein. 
Saccharomyces cerevisiae strain EGY191 (Estojak et al., 
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(1995) Mol. Cell Bio. 15:5820-5829) was transformed with 
the bait construct and with the LacZ reporter plasmid 
pSH18-34 (Golemis et al., 1996, supra). 

The EGY191/pSH18-34/pEG202-t-hMLHl cells were 
5 supertrans formed with a human fetal brain cDNA library 
constructed in the vector pJG4-5. This vector directs 
the synthesis of proteins fused to the B42 
transcriptional activator domain (Ruden et al., (1991) 
Nature 350:25-252) and the expression is controlled by 

10 the galactose-inducible GAL1 promoter. Approximately 4 
x 10 5 independent transf ormants were obtained in yeast 
and used for screening. For selection of the positive 
interactors, the supertransf ormed cells were cultured on 
leucine-minus / galactose solid medium. Colonies 

15 growing on this medium after 3-5 days incubation were 
subcultured on leucine-minus or X-Gal media containing 
either glucose or galactose as a carbon source. 
Twenty-two colonies growing on leucine-minus / galactose 
but not leucine-minus / glucose medium and turning blue 

20 on X-Gal / galactose but not X-Gal / glucose plates were 
further characterized. 

Plasmid DNA encoding putative hMLHl interactors was 
isolated from these colonies (clones fl through f22) , 
transferred first to KC8 and then to XL-1 blue E. coli 

25 strains, and seguenced. These and subsequent sequencing 
reactions were performed on double stranded DNA with the 
ABI automated sequencer 377 using dye terminator 
chemistry (Perkin Elmer) . Sequence assembling and 
analysis was performed with the Genetics Computer Group 

30 software (Genetics Computer Group, 1994) . Since the f5 
clone (later named MED1) was shorter (0.8 kb 3' of B42) 
than the mRNA transcript detected in human tissues by 
Northern blot analysis (approximately 2.4 kb) , a 
f5-derived probe was used to screen three additional 

35 cDNA lambda libraries. The libraries, derived from 
human fetal brain (Stratagene and Clontech) and from the 
ovarian cancer cell line C200 (gift of Drs. A. Godwin 
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and G. Kruh) , were screened following standard 
procedures as previously described (Bellacosa et al., 
1994 , supra) . 

Screening of a human genomic DNA library prepared 
in the lambda phage FIX II (Stratagene) with the f 5/MED1 
cDNA probe yielded six clones. One of these clones (# 
16) was further characterized and subcloned in plasmid 
vectors. Sequence analysis of the subclones and 
comparison to the MED1 cDNA sequence allowed mapping of 
seven MED1 exons (exons 2 through 8 f Fig. 14). The 
remaining exon (exon 1) and the intervening intron 
between exon 1 and exon 2 was cloned by PCR utilizing 
human genomic DNA as template and the primers of 
Sequence I.D. No. 6 and 20. SEQ ID NO: 20 is 
CAAATCTTCCTGCTGTCTTCC which maps within exon 2. Table 
I provides suitable primer sets for amplifying exons of 
the MED1 gene. 

This human genomic clone has been deposited with 
the American Type Culture Collection, 10801 University 
Blvd., Manassas, VA 20110-2209 on July 28, 1998 under 
the terms of the Budapest Treaty, Accession Number: Not 
yet assigned. The sequence of the human genomic clone is 
shown in Figure 20, SEQ ID NO: 22. 

TABLE I. OLIGONUCLEOTIDE PRIMERS FOR MED1 



primer 



3 / primer 



1 GTCTGGGGCGCTTTCGCAA 
( SEQ ID NO: 6) 

2 ACTCCCATAGCACAAGACTGG 
(SEQ ID NO: 8) 

3 CCCTTCTATTTACTAGCAGTA 
(SEQ ID NO:10) 

exons 4 TGCATCCCTCAATATTGCTTT 

and 5 (SEQ ID NO: 12) 

exon 6 AGCCCACCTGGAGTCTTGTAA 

(SEQ ID NO: 14) 
exon 7 GAAGCTGACCTGATAATGTGG 

(SEQ ID NO: 16) 
exon 8 TATCGTAATGTACTGTCCCCC 

(SEQ ID NO: 18) 



exon 



exon 



exon 



CCACACACTGTCCACTCTCCCG 
(SEQ ID NO: 7) 
GCTATGCTCCCACTACCTGC 
(SEQ ID NO: 9) 
GATGCAGCATATAAATTTCTC 
(SEQ ID NO:ll) 
TCAATTCAGTGCTTTCTCCCT 
(SEQ ID NO: 13) 
AAAGTTTAAGGTGTGGCTCTC 
(SEQ ID NO: 15) 
CTTATTTTGCCTCAGAGACCA 

(SEQ ID NO: 17) 
GCTTTAGCAAGGCTGATAGAA 
(SEQ ID NO: 19) 
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Screening at low stringency of a mouse 129/SVJ 
strain genomic DNA library prepared in the lambda phage 
FIX II (Stratagene) with the same Hindlll-Hindlll 
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fragment derived from the human MEDl cDNA probe (from 
nucleotide 1513-1935 of SEQ ID NO: 1) yielded one clone. 
This clone (#3) was further characterized and subcloned 
in plasmid vectors. Sequence analysis of the subclones 
5 and comparison to the human MEDl cDNA and genomic 
sequence allowed mapping of seven mouse MEDl exons 
(exons 1 through 7. Fig. 16). Assembling of the mouse 
MEDl exons allowed the derivation of a partial sequence 
of the mouse MEDl cDNA (Fig. 17) . From the latter 

10 sequence a partial predicted amino acid sequence of the 
mouse MEDl protein was derived and it was shown to be 
highly conserved by comparison to the human MEDl protein 
sequence (Fig. 18) . This mouse genomic clone has been 
deposited with the American Type Culture Collection, 

15 10801 University Blvd., Manassas, VA 20110-2209 on July 
28, 1998 under the terms of the Budapest Treaty, 
Accession Number: Not yet assigned. The sequence of the 
mouse genomic clone is shown in Figure 19, SEQ ID NO: 21. 
B. Northern and Southern blot analysis. 

20 A multiple tissue northern blot of poly-A selected 

RNA (Clontech) was hybridized under high-stringency 
conditions to a 32 P-labeled 0.8 kb f5 probe. The blot was 
washed to a final stringency of 0.1 x SSC/0.1% SDS (1 x 
SSC is 0.15 M NaCl/0.015 M sodium citrate) at 65°C for 

25 40 minutes, and then exposed to X-ray film (Kodak X-Omat 
AR) at -70°C 

For the "Zoo" blot experiment, genomic DNA prepared 
from vertebrate species was digested with the 
restriction enzyme Hindlll (New England Biolabs) , 

30 separated on a 0.8% agarose gel and transferred to a 
nylon membrane. The membrane was hybridized to a 32 P- 
labelled human MEDl cDNA probe (Hindlll-Hindlll fragment 
from nucleotide 1513 to nucleotide 193 5 of the Sequence 
I.D. No. 1) . Hybridization was performed in a solution 

35 containing 35% formamide, 6x SSC, 5x Denhardt's solution, 
20 mM sodium phosphate pH 6.5, 2 0 micrograms/ml of 
sheared E. coli genomic DNA and 0.5% sodium dodecyl 
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sulfate (SDS) . The filter was washed twice at room 
temperature and twice at 65 °C in a solution containing 
4x SSC and 0.1% SDS. Hybridization signals were 
revealed by autoradiography. 
5 Hybridization of the Hindlll-Hindlll fragment probe 

(from nucleotide 1513 to nucleotide 1935 of the Sequence 
I.D. No. 1) at low stringency to a "zoo" blot revealed 
conservation of the MED1 gene among vertebrates. See 
Figure 15. 

10 C. In vitro transcription and translation. 

Coupled in vitro transcription and translation was 
conducted with a rabbit reticulocyte lysate- and T7 RNA 
polymerase-based kit (Promega) , following the 
manufacturer's recommendations and employing 

15 35 S-methionine (Amersham) . 

D. Cell culture, expression constructs, and 

transf ections . 

NIH 3T3 cells were cultured in Dulbecco's modified 
Eagle's medium supplemented with 10% calf serum, 

20 penicillin (50 units/ml) , streptomycin (50/xg/ml) , and 
kanamycin (100 /xg/ml) . The expression constructs of 
MED1 were generated in the CMV promoter-based CMV5 
vector, a derivative of CMV4 (Andersson et al., (1989) 
J. Biol. Chem. 264:8222-8229). For construction of the 

25 hemagglutinin epitope carboxy-terminally tagged MED1 
plasroid, the MED1 cDNA was inserted in place of the 
Gfi-i ZN mutant construct open reading frame (Grimes et 
al. (1996) Mol. Cell Bio. 16:6263-6272), a gift of Dr. 
Leighton Grimes. For construction of the hemagglutinin 

30 epitope amino terminally-tagged MED1 plasmids Ml and M2, 
a Xbal site was inserted by polymerase chain reaction 
immediately upstream of the ATG codons at nucleotide 
positions 142 and 262, respectively. Then the MED1 open 
reading frame, excised with Xbal and Nsil (blunted) , was 

35 inserted in place of the Akt gene in the CMV5 
hemagglutinin tag*Akt construct (Datta et al., (1996) J. 
Biol. Chem. 271:30835-30839). 
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Transient transf ections of NIH 3T3 cells seeded in 
6-well plates at 0.15 x 10 6 cells/well were carried out 
using 1.5 jug of DNA and 6 ^tl of lipof ectamine (Life 
Technologies, Inc.), following the manufacturer's 
5 protocol. Forty-eight hours after transf ection, cells 
were washed twice with Dulbecco phosphate buffered 
saline and then lysed with RIPA buffer (lOmM sodium 
phosphate pH 7.0, 150mM NaCl, 1% w/v sodium 
deoxycholate, 1% v/v Nonidet P-40, 0.1% w/v sodium 

10 dodecylsulf ate , ImM phenylmethylsulf onyl-f luoride , 
2/xg/ml aprotinin, 2/xg/ml leupeptin, 50mM NaF, ImM sodium 
pyrophosphate, ImM sodium orthovanadate , ImM 
dithiothreitol, and 2mM EDTA) . 
£• Western blotting. 

15 Cell lysates were separated by sodium 

dodecylsulf ate-polyacrylamide gel electrophoresis 
(SDS-PAGE) in 8.5% gels and transferred to Immobilon P 
membranes (Millipore) by electroblotting with a Genie 
apparatus (Idea Scientific Co.) in a buffer containing 

20 25mM Tris-HCl pH 8, 190mM glycine and 20% v/v methanol. 
Following overnight incubation in 5% dry milk in 
Tris-buf fered saline (TBS: 0.9% w/v NaCl, lOmM Tris-HCl 
pH 7.4, 0.05% w/v MgCl 2 ) , the membrane was incubated for 
1 hour at room temperature with the anti-hemagglutinin 

25 tag monoclonal antibody 12CA5 (Boehringer) in 2% dry 
milk in TBS. After three 10- minute washes in TBS 
supplemented with 0.1% v/v Tween- 20, the membrane was 
incubated for 40 minutes at room temperature with an 
anti-mouse secondary antibody conjugated to horseradish 

30 peroxidase (Amersham) . Following washing, the bound 
secondary antibody was detected by enhanced 
chemiluminescence (Amersham) . 
F. Fluorescence in situ hybridization. 

Metaphase spreads from normal human lymphocytes 

35 were prepared according to published methods (Fan et al. 
(1990) Proc. Natl. Acad. Sci. 87:6223-6227). Nick 
translation was used to label a MED1 genomic DNA 
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subclone with biotin-16-dUTP. Three hundred ng of the 
probe were then mixed with 150 /ig of human Cot-1 DNA 
(Life Technologies Inc.) and 50 /ig salmon sperm DNA to 
block repetitive elements. The DNA was denatured at 75°C 
5 for 5 minutes and then reannealed for 1 hour at 37 °C 
prior to hybridization to metaphase spreads overnight at 
37 °C. The MED1 signal was detected with fluorescein 
isothiocyanate-labeled avidin (Oncor) , whereas the 
chromosomes were counterstained with propidium iodide 

10 (Oncor) . Metaphase spreads were observed using a Zeiss 
Axiophot microscope and images were captured by a cooled 
CCD camera (Photometries) connected to a computer 
workstation. To identify the precise chromosomal 
location of the probe, the separate digitized images of 

15 FITC and propidium iodide were merged using Oncor 
version 1.6 software. 
6. Electromobility shift analysis 

Transient transf ections of 293 cells seeded in 10- 
cm dishes were carried out using 12 /ig of DNA and 48 pi 

20 lipofectamine (Life Technologies, Inc.), following the 
manufacturer's protocol. Seventy-two hours after 
transf ections, cells were washed twice with Dulbecco's 
phosphate buffered saline and then lysed with NP-40 
lysis buffer (0.5% Nonidet P-40, 10% glycerol, 137 mM 

25 NaCl, 20 mM Tris-HCl, pH 7.4) containing 1 mM 
phenylmethylsulf onylf luoride, 2 pg/ml aprotinin, 2 /zg/ml 
leupeptin , 1 mM NaF , 1 mM sodium pyrophosphate , 1 mM 
sodium orthovanadate , and 1 mM dithiothreitol. Nuclei 
were disrupted by sonication with a sonic dismembrator 

30 (Fisher) . Flag-MEDl was immunoprecipitated from the 
cell lysates with an anti-Flag antibody coupled to 
agarose beads (Kodak) and then e luted in a 50 /il volume 
with a solution containing a molar excess of Flag- 
peptide (Kodak) in electromobility shift analysis (EMSA) 

35 buffer (10 mM Tris-HCl, pH 7.5, 50 mM NaCl, 0.5 mM EDTA, 
5% glycerol) . A double stranded oligonucleotide 
containing five fully methylated CpG sites was generated 
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by annealing the following oligonucleotides (M= 5- 
methylcytosine) : 
Sequence I, D. No. 3: 

5 ' -gcgaattcmgtgcgamgaagcmggacgatmgaccagmgctcgagca-3 ' 
5 Sequence I. D. No. 4: 

5 ' -GTGCTCGAGMGCTGGTMGATCGTCMGGCTTMGTCGCAMGGAATTCG-3 ' 
The double-stranded oligonucleotide was labeled with 32 P- 
a-dCTP and Klenow enzyme. EMSA was conducted as 
described previously (Durand et al., (1988) Mol . Cell. 

10 Biol. 8:1715-1724). Briefly, binding of MED1 to labeled 
oligonucleotides was carried out by incubating 1 /il out 
of 50 Ml of the MED1 eluate, 7 X 10 4 cpm of labeled 
oligonucleotides and 4 p.q of poly (dl-dC) in EMSA buffer 
(final volume of 20 /il) at room temperature. 

15 Competition was carried out in the presence of 100 ng 
(100-fold excess) of the cold oligonucleotide. Binding 
reactions were separated on a 6% non-denaturing 
polyacrylamide gel and visualized by autoradiography of 
the dried gel. 

20 For the electromobility shift assay employing the 

purified methyl-CpG binding domain (MBD) of MED1, the 
methylated probe was assembled by annealing the two 
complementary oligonucleotides of Sequence I.D. No. 3 
and Sequence I.D. No. 4. containing 5-methylcytosine. 

25 See Figure 10B. The unmethylated probe was assembled 
with two complementary oligonucleotides of identical 
sequence to the oligonucleotides of Sequence I.D. No. 3 
and Sequence I.D. No. 4., except that cytosine replaced 
5-methylcytosine. Labeling of the probes was conducted 

30 as above. DNA binding reactions were carried out in 10 
mM Tris-HCl pH 7.5, 50 mM NaCl, 5% glycerol, 0.5 mM 
EDTA, 0.5 mM DTT, in the presence of 0.5 mg of 
polydA/polydT (ICN) as non-specific competitor DNA [S. 
Buratowski and L.A. Chodosh, In Current Protocols in 

35 Molecular Biology, eds. F. M* Ausubel, et al., John 
Wiley & Sons, New York (1996)]. Bacterially expressed 
and purified MBD (20 ng) was incubated with the 



50 



WO 99/04626 



PCT/US98/15828 



32 P-labeled double-strand oligonucleotides (20,000 cpm, 
0.2 ng) on ice for 30 min. For competition, the MBD was 
pre-incubated on ice for 20 min with a 100-fold excess 
of the cold oligonucleotide (20 ng) prior to addition of 
5 the probe. Binding reactions were loaded on a 10 % 
acrylamide gel and run at 4"C in 0.5x TBE. Dried gels 
were exposed to autoradiography. 
H. Co-immunoprecipitation analysis 

To analyze the interaction of MED1 with hMSH2, 

10 following transient transfection of 293 cells with the 
constructs of the invention, and lysis of cells after a 
72 hour period, proteins were immunoprecipitated with 
anti-Flag antibody as described above, using an antibody 
against hMSH2 . Immunoprecipitates were resuspended in 

15 Laemmli buffer, boiled for 10 minutes, separated on 8.5% 
SDS-PAGE and transferred to Immobilon P membranes. 
Western blotting was carried out as described above. 

For analysis of the interaction of MED1 with hMLHl, 
HEK-293 cells were cultured at 37 *C and 7.5% C0 2 in 

20 Dulbecco's modified Eagle's minimum essential medium 
(DMEM) supplemented with 10% fetal calf serum, 
penicillin (50 units/ml), streptomycin (50 /xg/ml) , and 
kanamycin (100 /tg/ml) . Cells seeded in 100-mm Petri 
dishes were transfected using Lipof ectAMINE (Life 

25 Technologies, Inc.) according to the manufacturer's 
protocol. Seventy-two hours later, cells were lysed on 
ice in one of three lysis buffers, containing 0.5% 
Nonidet P-40 (NP-40) [K. Datta et al., Mol. Cell. Biol. 
15: 2304-2310 (1995)], 0.2% NP-40 [W. Gu, K. Bhatia, 

30 I.T. Magrath, C.V. Dang, R. Dal la-Fa vera, Science 264: 
251-254 (1994)], or 1% Triton X-100 [S. F. Law et al., 
Mol. Cell. Biol. 16: 3327-3337 (1996)]; NP-40 lysates 
were mildly sonicated using a sonic dismembrator 
(Fisher) . Immunoprecipitations were carried out with 

35 the ant i -hemagglutinin tag antibody HA. 11 coupled to 
beads (Berkeley Antibody Company) . Immune complexes 
were washed with lysis buffer, and the proteins were 
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resolved by 8.5% SDS-polyacrylamide gel electrophoresis 
(SDS-PAGE) and transferred to PVDF membranes (Immobilon 
P, Millipore) with an X-genie electroblotter (Idea 
Scientific) . Membranes were probed with an anti-MLHl 
5 antibody (Pharmingen) and the HA. 11 antibody (Berkeley 
Antibody Company) . Detection of antigen-bound antibody 
was carried out using enhanced chemiluminescence (ECL, 
Amersham) , according to the manufacturer's protocol. 
See Figure 11C. 

10 I. Expression of the MED1 catalytic (endonuc lease) 
domain in E. coli 

The nucleic acid sequence encoding the catalytic 
domain of MED1 was cloned in the vector pET28b (Novagen) 
as a carboxy terminal fusion to a 6xHis tag for 

15 expression in E. coli. This construct was transferred 
to the E. coli strain BL21(DE3)pLysS. Overnight 
cultures were diluted 1:15 in fresh medium and incubated 
for one-hour in a 37 °C incubator. Expression of the 
construct was induced by addition of 1 mM IPTG for an 

20 addditional 3 hours at 37 °C. Cells were then collected 
by centrifugation and lysed in Laemmli buffer. Lysates 
were boiled for 10 minutes and separated on 12% SDS- 
PAGE. Proteins were visualized by Coomassie blue 
staining. 

25 J. Activity staining of the MEDl-endonuclease domain 
after sodium dodecyl sulf ate-polyacrylamide gel 
electrophoresis 

Activity staining of MED1 was performed essentially 
as described by Blank et al. (Blank et al. (1982) 

30 Analytical Biochemistry 120: 267-275) . Briefly, 
bacterial lysates expressing the MED1 catalytic domain 
were separated in SDS-polyacrylamide gels (12%) 
containing 0.15 mg/ml heat-denatured calf thymus DNA. 
Following electrophoresis, the gel was incubated in a 

35 buffer containing 10 mM Tris-HCl pH 7.4 and 25% 
isopropanol for one hour at room temperature with one 
change of buffer every twenty minutes. After the first 
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hour, the gel was immersed in a buffer containing 10 mM 
Tris-HCl pH 7.4 for an additional hour with buffer 
changes every twenty minutes. The gel was then immersed 
in a buffer containing 10 mM Tris-HCl, pH 7.4, 10 mM 
5 MgCl 2 , 5 mM CaCl 2 , 2 /iM ZnCl 2 for 16 hours at room 
temperature to allow digestion of DNA. DNA was 
visualized by staining the gel with 0.2% toluidine blue 
0 in 10 mM Tris-HCl pH 7.4, followed by destaining in 10 
mM Tris-HCl pH 7.4 for one hour at room temperature with 

10 one change of buffer every 20 minutes. Deoxyribonuclease 
activity results in a zone of clearing indicating 
reduced DNA staining (Blank et al., (1982) supra). 
K. Endonuclease activity of recombinant wild- type MED1. 
The entire wild-type MED1 (codons 1-580, wt) and a 

15 deletion mutant lacking the endonuclease domain (codons 
1-454, Aendo) were expressed in bacteria and purified by 
nickel-agarose chromatography. For bacterial 

expression, PCR-generated fragments corresponding to the 
entire MED1 open reading frame or to isolated domains 

20 were cloned in pET28(b) (Novagen) and propagated in J5. 
coli strain XL-1 Blue (Stratagene) . Constructs were 
sequenced with an automated DNA sequencer (ABI) to 
verify that unwanted mutations were not inadvertently 
introduced; and they were transferred into E. coli 

25 strain BL21(DE3)pLysS. These cells were grown to 
O.D.600= 0.4 and then induced with 1 mM IPTG at 37 *C for 
3 hours. Bacterial lysates were purified over a 
nickel-agarose column (Ni 2 H — NTA agarose, Qiagen) . 
Increasing amounts of the wild-type and Aendo mutant 

30 (22, 44, 87.5 and 175 ng) were incubated with 500 ng of 
the 3.9 kb supercoiled plasmid pCR2 (Invitrbgen) at 37 "C 
for 30 min in a buffer containing 20 mM Tris-HCl pH 7.5, 
25 mM KC1 and 10 mM MgCl 2 . Reaction products were 
separated on a 1% agarose gel buffered in Ix TAE and 

35 containing 0.25 jig/ml ethidium bromide. 

Identification and Characterization of MED1 
To facilitate efforts to identify eukaryotic 
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functional homologues of the E. coli MutH endonuclease, 
the yeast interaction trap assay, a cloning strategy 
which screens for protein-protein interactions in the 
yeast S. cerevisiae (Golemis et al., 1996, supra) was 
5 employed. This strategy was based on the rationale that 
the human mismatch repair endonuclease would interact 
with hMLHl, the human MutL homologue, in a comparable 
way to what is observed in bacteria where the 
endonuclease MutH interacts with MutL. The complete 

10 coding sequence of hMLHl (amino acids 1-756) was fused 
to the carboxy terminus of the DNA binding domain of 
LexA. This construct ("bait") was introduced along with 
the appropriate reporter plasmid in the yeast strain 
EGY191. EGY191, which harbors only two LexA operators 

15 directing transcription of the chromosomal LEU2 gene, 
was used because in initial experiments, employing the 
standard EGY48 strain, the bait protein had constitutive 
transcriptional activity (data not shown) . Western blot 
analysis with an anti-LexA antibody showed that 

20 pEG202-t-hMLHl directs the synthesis of the expected 
size product for a Lex A- hMLHl bait protein in EGY191. 
In control experiments, performed following standard 
procedures, this protein was transported to the nucleus 
and did not activate transcription of the chromosomal 

25 LEU2 gene and of the episomal LacZ gene (data not 
shown). The EGY191/pSH18-34/pEG202-t-hMLHl yeast cells 
were supertrans formed with a human fetal brain cDNA 
library (approximately 4 x 10 5 recombinants) fused to the 
B42 portable activation domain, and colonies growing on 

30 selective leucine-minus plates in the presence of 
galactose but not glucose as carbon source were 
isolated. Twenty-two clones (fl to f22) were selected 
encoding putative hMLHl interactors. One clone, 
designated f5, (later named MED1) was identified which 

35 strongly interacted with hMLHl, based on the early 
appearance of colonies on selective 
leucine-minus/galactose plates and on the intensity of 
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color formation of colonies grown on indicator 
X-Gal/galactose plates. The specificity of the f5-hMLHl 
interaction was assayed by supertransf orming virgin 
EGY191/pSH18-34/pEG202-t-hMLHl cells with f5 plasmid 
5 DNA. As a control, EGY191/pSH18-34 cells transformed 
with bait constructs of pEG202-bicoid, -MYC, -K-rev, and 
empty pEG202 vector, were also supertransf ormed with f5 
DNA. Cells transformed with the combination of f5 and 
pEG202-t-hMLHl grew on leucine-minus / galactose but not 

10 leucine-minus / glucose medium and turned blue on X-Gal 
/ galactose but not X-Gal / glucose plates. Control 
cells failed to grow on leucine-minus / galactose and to 
turn blue on X-Gal / galactose plates, confirming 
specificity of the interaction between f5 and hMLHl as 

15 shown in Figure 1. 

Initial sequence analysis revealed that f5, which 
was represented only once in this group of 22 putative 
interactors, codes for a protein sharing homology with 
several bacterial endonucleases involved in DNA repair. 

20 Since the f5-encoded protein is a putative DNA repair 
enzyme, its expression is expected to be ubiquitous. A 
Northern blot containing mRNA from multiple tissues was 
probed with the entire 0.8 kb insert of the f5 clone. 
This analysis revealed that, consistent with a putative 

25 housekeeping role in DNA repair, the f5 gene is 
expressed in all normal tissues tested with a transcript 
of approximately 2.4 kb. See Figure 2. 

In order to clone the remaining portion of the 
gene, a f5-derived probe was used to screen four 

30 additional cDNA libraries, three from fetal brain and 
one from the ovarian cancer cell line C200. Six clones 
were isolated from the fetal brain libraries and 11 from 
the C200 library. These clones were sequenced. 
Overlapping sequences were aligned until the nearly 

35 complete sequence of the gene was determined (2.1 kb) . 

See Figure 3. The MED1 transcript contains an open 
reading frame of 1740 bases, preceded by an in-frame 
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stop codon, which predicts a protein of about 580 amino 
acids encoded by the sequence of Sequence I.D. No. 2. 
Slight sequence variations were observed between the 
cDNA clones analyzed. These are. set forth below: 
5 SEQUENCE VARIATIONS 

1) Nucleotides 1325-1342: 18 nucleotides 
GTGAGAAAATATTTCAAG - are either present (as in Sequence 
I.D. No. 1) or absent (as. in Sequence I. D. No. 23) from 
the cDNA, therefore the 6 amino acids encoded by those 

10 nucleotides (GEKIFQ) are either present (as in Sequence 
I. D. No. 2) or absent (as in Sequence I. D. No. 24) in 
the predicted protein. This variation appears to 
originate from alternative usage of a splice donor site. 
In the genomic DNA sequence: 

15 . . . GACTTCACTGGTGAGAAAATATTTCAAGGT . . . 

If the second splice donor site (bold) is used, 
then the 18 nucleotides GTGAGAAAATATTTCAAG are 
incorporated in the mRNA; if the first splice donor site 
(underlined) is used, then the same 18 nucleotides are 

20 spliced out and are not incorporated in the mRNA. 

2) Nucleotide 1876: T (as in Sequence I.D. No. 1) or C 
(as in Sequence I. D. No. 25), therefore codon 579 is 
either TTA or CTA (no amino acid variation, since both 

25 code for leucine) . 

3) Nucleotide 2042: C (as in Sequence I.D. No. 1) or T 
(as in Sequence I. No. 26) , (no amino acid variation, 
since this change is in the 3' untranslated region). 

30 

4) Poly-A tail: Added after nucleotide 2106 (as in 
Sequence I.D. No. 1) or approximately 150-200 bases 
downstream (precise site not determined) : this variation 
probably originates from an alternative polyadenylation 

35 signal. 

5) Nucleotide 1214 = T (as in Sequence I.D. No. 1) or 
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C (as in Sequence I. D. No. 27), therefore codon 358 is 
either ATC or ACC f coding for isoleucine or threonine, 
respectively. This sequence variation is described in 
more detail in relation to Example II. 
5 Analysis of the predicted MED1 protein sequence 

reveals a tripartite structure. At the amino terminus, 
MED1 contains a region of homology to the methyl-CpG 
binding domain (MBD) of MeCP2, a chromosomal protein 
which binds CpG-methylated DNA and may mediate the 

10 effects of DNA methylation on chromatin structure and 
transcription (Lewis et al. , (1990) Cell 69:905-914; Nan 
et al., (1993) Nucleic Acids Res. 21:4886-4892). The 
same region of MED1 is also homologous to the MBD of the 
human protein PCMl, a component of the transcriptional 

15 repressor MeCPl (Cross et al., (1997) Nat. Genet. 
16:256-259). The central portion of MED1 does not 
display a recognizable domain structure, but it appears 
to be rich in positively-charged amino acids, often 
arranged in short clusters which might represent nuclear 

20 localization signals (Boulikas, T. , (1993) Critical Rev. 
in Eukaryotic Gene Expression 3:193-227). Finally, at 
the carboxy terminus, MED1 contains a putative catalytic 
domain sharing homology with several bacterial 
endonucleases involved in DNA repair, including MutY and 

25 endonuclease III from E. coli, ultraviolet endonuclease 
from Micrococcus luteus , and the putative endonuclease 
encoded by the ORF10 of the thermophilic archaeon 
Methanobacterium thermoformicicum. See Figure 4A, 4B and 
4C. A schematic of the domain organization of MED1 is 

30 shown in Figure 5. 

In order to confirm that the MED1 open reading 
frame is capable of directing the synthesis of a protein 
product, a construct of MED1 in the vector pcDNA3 was 
employed in an in vitro coupled transcription and 

35 translation assay. The result indicated that the MED1 
open reading frame drives the translation of two 
polypeptides of 70 and 65 kD, shown in Figure 6, in good 
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agreement with the molecular weight predicted from the 
amino acid sequence. The synthesis of these two 
polypeptides might be the result of initiation from the 
two close ATG codons, at nucleotide position 142 and 
5 262, respectively. Such a possibility is known to occur 
as a result of "leaky" ribosome scanning and is 
increased by a suboptimal Kozak's context (Kozak, M., 
(1995) Proc. Natl. Acad. Sci. 92:2662-2666). The 
difference in molecular weight (5kD) would be compatible 

10 with the distance between the two ATG codons (40 a. a.) . 

To determine if two MED1 proteins are also 
synthesized in vivo, a hemagglutinin epitope was fused 
at the carboxy terminal end of the MED1 open reading 
frame, generating the construct MEDl-HT. Constructs were 

15 also generated which fused a hemagglutinin tag 
immediately before each of the two putative initiation 
codons (HT-MED1-M1 and HT-MED1-M2 ) . These constructs 
were transiently transf ected in NIH-3T3 cells and 
lysates of the transf ectants were prepared and separated 

20 by SDS-PAGE. Western analysis with an 

anti-hemagglutinin tag antibody revealed the presence of 
a band of approximately 72 kD in cells transfected with 
the carboxyterminally tagged MEDl-HT. This band 
comigrates with the one present in HT-MED1-M1 

25 transf ectants, indicating that the first ATG at 
nucleotide position 142 is the initiation codon 272 vivo. 
See Figure 7. Since the expression level of HT-MED1-M1 
which uses the hemagglutinin tag ATG codon is much 
higher than MEDl-HT which uses the autologous ATG codon, 

30 it is possible that the expression of the MED1 protein 
is under a tight translational control. 

Finally, the MED1 gene was mapped with fluorescence 
in situ hybridization to human chromosome 3q21. See 
Figure 8. 

35 In order to determine whether MED1 has endonuclease 

activity, the catalytic (endonuclease) domain was 
expressed in E. coli as a carboxy terminal fusion to a 
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6xHis tag- High, levels of expression of the domain as 
a polypeptide of approximately 18-22 JcD were achieved. 
See Figure 9A, left panel. Bacterial lysates expressing 
the catalytic domain were separated in an activity 
5 polyacrylamide gel containing denatured calf thymus DNA. 
Following electrophoresis, the gel was incubated in a 
Tris-buf fered solution containing 25% isopropanol and 
then in Tris buffer alone to allow digestion of DNA. 
DNA was visualized by staining the gel with toluidine 

10 blue 0. Results revealed a zone of clearing, 

indicative of DNA digestion, migrating at approximately 
18-22 kD in E. coli lysates expressing the endonuclease 
domain but not in control lysates. See Figure 9A, right 
panel. This experiment indicates that the recombinant 

15 endonuclease domain of MED1 displays deoxyribonuclease 
activity. 

To better define its nuclease properties, the 
entire MED1 protein was expressed in E. coli as a 
car boxy terminal fusion to a six-histidine tag and 

20 purified on a nickel-agarose column to approximately 95% 
homogeneity. See Figure 9B, left panel. Endonuclease 
activity was assayed by evaluating the conversion of a 
supercoiled plasmid into open circles (nicked) and 
linear molecules. Increasing amounts of the purified 

25 MEDl protein were incubated with supercoiled plasmid DNA 
at 37 °C for 30 min, and the products of the reactions, 
separated on a 1% agarose gel, were visualized by 
ethidium bromide staining. Incubation with MEDl 
resulted in a dose-dependent appearance of nicked and 

30 linearized molecules (Fig. 9B, right panel). In order 
to rule out the possibility that a bacterial 
endonuclease activity copurifying with MEDl is 
responsible for the observed effects, a deletion mutant 
lacking the putative endonuclease domain was also 

35 purified. This mutant failed to produce nicked and 
linearized DNA molecules (Fig. 9B, right panel) . These 
results indicate that MEDl has single- and double-strand 
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endonuclease activity. Digestion of the MEDl-linearized 
plasmid with the restriction enzyme EcoRI, which 
perforins two closely spaced cuts on this plasmid, 
resulted in the appearance of a smear, indicating that 
5 MED1 does not have preferential cutting sites on this 
substrate- The production of linear molecules by MED1 
in the above assay is intriguing. The kinetics suggest 
rapid counter-nicking of the second strand across from 
a site where the first nick is formed. It will be 

10 interesting to determine whether the MED1 nicks occur in 
CpG-rich regions and whether cytosine methylation 
inhibits the second nicking event. 

To assess whether the MED1 methyl-CpG binding 
domain (MBD) is able to bind methylated DNA f a FLAG 

15 epitope was fused at the amino terminal end of the MED1 
open reading frame, generating the construct FT-MEDl/f 5 , 
and this construct was transfected into the human kidney 
line 293. Cells were also transfected with the empty 
expression vector. Seventy-two hours after 

20 transf ection, cell were lysed and the lysates were 
immunoprecipitated with an anti-Flag antibody coupled to 
agarose beads. Bound protein was eluted from the beads 
following incubation with a FLAG peptide. The FT- 
MEDl/f 5 and control eluates were incubated with a 32 P- 

25 labeled double-stranded oligonucleotide containing a 
total of five fully methylated CpG sites, in the 
presence or absence of a 100-fold excess of the 
unlabeled or "cold" oligonucleotide. The binding 
reactions were separated on a non-denaturing 

30 polyacrylamide gel and detected by autoradiography of 
the dried gel. A slowly migrating band was detected in 
the FT-MEDl/f 5 eluate lanes, but not in the control 
lane. This band was abolished by competition with 
excess cold oligonucleotide. This experiment indicated 

35 that the MBD of MEDl functions as a specific methylated 
DNA binding domain in vivo. See Figure 10A. 

To further characterize the DNA binding properties 
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of MED1 , its putative methyl-CpG binding domain (MBD) 
was expressed in E. coli as a car boxy terminal fusion to 
a six-histidine tag, and it was purified by 
metal-chelating affinity chromatography followed by 
5 ion-exchange chromatography on SP Sepharose (Pharmacia) . 
The purity of the MED1 MBD was estimated at >98% by 
SDS-PAGE followed by Coomassie staining. The purified 
MBD was incubated with a 32 P-labeled double-strand 
oligonucleotide of arbitrary sequence containing five 

10 symmetrical methyl-CpG sites. As a control, MBD was 
incubated with a 32 P-labeled double-strand 
oligonucleotide of identical sequence in which cytosines 
replaced methyl-cytosines. EMSA analysis, of the 
complexes indicated that the MED1 MBD binds to 

15 methylated DNA and fails to bind to unmethylated DNA 
(Fig. 10B, lanes 2 and 6) . Binding to the methylated 
probe was competed by preincubation with a 100-fold 
excess of cold methylated oligonucleotide (lane 3) . 
Little competition was observed following preincubation 

20 with the unmethylated oligonucleotide (Fig. 10B, lane 
4) • This experiment provides further evidence of the 
methyl-CpG binding specificity of the MED1 MB 

The physical association of MED1 with other DNA 
repair proteins was assessed as follows. 293 cells were 

25 transfected with the construct FT-MEDl/f 5 or with an 
empty expression vector. Seventy-two hours after 
transf ection, cell lysates were prepared and 
immunoprecipitations carried out with anti-FLAG 
antibodies coupled to agarose beads. Immunoprecipitated 

30 proteins were separated by SDS-PAGE, transferred to 
membrane and probed with anti-hMSH2 antibody. The 
antibody detected a band of approximately 103 kD 
comigrating with hMSH2 in the anti-FLAG 
immunoprecipitate from FT-MEDl/f 5 tranfected 293 cells 

35 but not from control cells. See Figures 11A and 11B. 

This experiment demonstrates the physical association of 
MED1 in a complex with hMSH2 . 
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In order to confirm that the MLH1 / MED1 
interaction detected in yeast also occurs in human 
cells, co-immunoprecipitation experiments were 
performed. Human kidney HEK-293 cells were transfected 
5 with a hemagglutinin-tagged construct of MED1 (HT-MED1) 
or with an empty expression vector. Seventy-two hours 
after transf ection, cell lysates were prepared and 
immunoprecipitations were carried out with an antibody 
directed against the hemagglutinin tag. 

10 Immunoprecipitated proteins were separated by SDS-PAGE, 
transferred to a membrane and probed with an anti-MLHl 
monoclonal antibody. The antibody detected a band of 
approximately 82 kD co-migrating with MLH1 in the 
ant i-hemagglut in in immunopr ec ip it at e from 

15 HT-MEDl-transfected HEK-293 cells but not from control 
cells (Fig. 11C) . This experiment suggests that MED1 is 
present in a complex with MLH1. 

EXAMPLE II 

Identification of Mutations in MEDl in HMPCC patients 

2 0 Mutational screening of the MEDl gene . has been 

performed in ten HNPCC patients. Earlier studies on 
these patients revealed that they were negative for 
hMSH2 and hMLHl mutations (Viel et al., (1997) Genes 
Chromosom Cancer 18:8-18). Polymerase chain reaction 

25 (PCR) amplification of MEDl fragments with MEDl-specif ic 
primer oligonucleotides (provided in Table I), has been 
performed followed by. direct sequencing of PCR products. 
A sequence variant which converts isoleucine 358 to 
threonine (I358T) has been identified in the germ-line 

30 of a female patient affected by two independent 
synchronous colon cancers. Analysis of one of the 
cancers revealed the loss of a normal allele. This 
finding is in agreement with a possible tumor suppressor 
role of MEDl. The I358T variant is presently being 

35 searched in other affected and unaffected individuals of 
the family to determine if it cosegregates with the 
disease. Thus, the I358T variant is present at a 
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frequency of 1 out of 10 HNPCC patients (10%). This 
variant is also present in the general population at a 
lower frequency of approximately 3 out of 69 individuals 
(4.3%). Taken together these findings suggest that the 
5 I358T variant of MED1 may be associated with an 
increased risk for colon cancer. 

EXAMPLE III 
Screening Cancer Patient DNA Samples 
10 for Mutations in MED1 

A panel of 14 sporadic colorectal cancers with 
microsatellite instability but with no detectable defect 
in the two major mismatch repair genes, hMSH2 and hMLHl 
( Y. Wu et al Genes Chromosomes and Cancer 18, 269: 1997) 

15 were screened for mutations by PCR amplification of all 
the MED1 exons from genomic DNA, followed by direct 
sequencing of PCR products with an automated DNA 
sequencer (ABI) , using the primers shown in Table I. 
Sequence analysis revealed MED1 mutations in 4 of 14 

20 (28.6%) tumors. In all four of these tumors, a one-base 
deletion occurred in one of two mononucleotide repeats 
[ (A) 6 and (A) 10] located in the coding region of MED1 
(Fig. 13A and 13B) (Mutations were confirmed by 
sequencing at least three independent PCR products on 

25 both strands) ; the mutations were somatic, as they were 
not detected in the corresponding peripheral blood DNA. 
The one-base deletions cause frameshifts and predict the 
synthesis of truncated proteins (Fig. 13C) . These 
alterations resemble the frameshift mutations described 

30 in the (A) 8 and (C)8 tracks present in the coding region 
of the mismatch repair genes MSH3 and MSH6 , respectively 
(S. Malkhosyan et al Nature 382 499 :1996). 
Furthermore, these alterations appear to be selected for 
in tumor cells, as similar (A)n mononucleotide repeats, 

35 including the (A) 8 stretch in the coding region of PMS2, 
are not altered in this tumor panel. Similarly, 
preliminary screening experiments of 26 endometrial 
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cancer patients led to the identification of a mutation 
in MEDl. 

TABLE II 



Age 

5 Patient Sex Tumor Site Diagnosis MEDl Mutation Codon 



cl8T 



10 



C220T 



15 C226T 



M 



caecum 

traverse 
colon 
ascending 
colon 



83 (A)10 to (A)9 310-313 



79 (A)10 to (A)9 310-313 



c215T 



20 



25 



UPN252T F endometrium 



70 (A)10 to (A)9 310-313 



66 (A) 6 to (A) 5 280-282 



N/A (A)10 to <A)9 310-313 



Result 

f rameshif t 
and stop at 
codon 317 

same as 
above 



same as 
above 



f rameshif t 
and stop at 
codon 317 

f rameshif t 
and stop at 
codon 317 



Discussion 

Two long-standing and closely related issues in 
eukaryotic mismatch DNA repair are identifying the 
endonuclease activity responsible for incising the DNA 

30 strand carrying the mutation, and defining the nature of 
the strand-targeting signal. In E . coli, MutH performs 
this function through the recognition of hemimethylated 
d(GATC) sites. However, eukaryotic functional homologues 
of MutH are not currently known. Due to the lack of 

35 information on the molecular determinants of strandedness , 
it was hypothesized that a reasonable approach towards the 
cloning of eukaryotic MutH functional homologues would be 
to identify hMLHl interactors. By analogy with the 
MutL-MutH interaction in the bacterial system, the 

40 eukaryotic mismatch repair endonuclease is expected to be 
a hMLHl interactor. 

Accordingly, the "interaction cloning" of MEDl, a gene 
encoding a viable candidate for the mismatch repair 
endonuclease is described in the previous examples. The 

45 MEDl protein has several features compatible with such a 
role. MEDl specifically interacts with hMLHl in the yeast 
system and mammalian cells, and with hMSH2 in a mammalian 
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cell system. Whether MED1 interacts with other components 
of the mismatch repair complex, such as hMSH3, hMSH6/GTBP 
and hPMS2 has yet to be determined. MED1 has a catalytic 
domain showing homology to several bacterial DNA repair 
5 endonucleases , and it is predicted that MED1 would have 
N-glycosylase and possibly apurinic or apyrimidinic (AP) 
lyase activities. Among the MED1 homologues, both the E. 
coli MutY and endonuclease III, and the M. luteus UV-repair 
endonuclease have DNA N-glycosylase and AP endonuclease 

10 activities. Interestingly, MutY is active on A.C, A. G and 
A.8-oxoG mismatches, whereas endonuclease III is active on 
mismatches containing some damaged derivatives of thymidine 
and cytosine. The homology between MED1 and the 
ORFlO-encoded protein of M. thermoformicicum (Nolling et 

15 al., (1992) Nucleic Acids Res. 20:6501-6507) is 
particularly intriguing. It has been proposed that this 
open reading frame encodes a mismatch DNA repair enzyme, 
functionally associated with the methylase of the M. 
thermoformicicum restriction/modification system. ORF10 

20 would be active on G/T mismatches originated by deamination 
of 5-methyl-cytosine, a product of the methylase, to 
thymidine under thermophilic conditions. Spontaneous 
deamination of 5-methyl-cytosine in CpG dinucleotides to 
thymidine (G.m5C -> G.T) is a source of endogenous mutations 

25 in the human genome (Rideout et al., (1990) Science 
249:1288-1290). Almost 50% of the p53 point mutations in 
colorectal cancer are transitions at CpG dinucleotides 
(Greenblatt et al. , (1994) Cancer Res. 54:4855-4878). 
Conservation of MEDl-related sequences involved in mismatch 

30 repair in organisms belonging to two distant phyla 
(Eubacteria and Archeobacteria) suggests that human MED1 is 
an endonuclease .active on DNA mispairs. 

A common feature of the MEDl-related endonucleases is 
the presence of a Cys-X6-Cys-X2-Cys-X5-Cys sequence at 

35 their carboxy terminus. This sequence, as shown in 
endonuclease III, ligates the [4Fe-4S] iron-sulfur cluster 
and defines a novel DNA binding motif (named the FCL 



65 



WO 99/04626 



PCT/US98/15828 



motif ) , which provides the correct alignment of the enzyme 
along the DNA (Thayer et al., (1995) Embo J. 14:4108-4120). 
MED1 lacks a FCL motif at its carboxy terminus, but 
contains a methyl-CpG DNA binding domain at the amino 
5 terminus. 

The presence of this methyl-CpG binding domain in MED1 
suggests a mechanism for strand-determination. In human 
mismatch repair, strand-specificity may be determined by 
the MEDl-mediated recognition of methyl-CpG sequences. The 

10 newly synthesized strand would be recognized as such by 
virtue of its transient lack of CpG methylation after 
replication as shown in Figure 12. In this model, cytosine 
methylation in eukaryotes would be functionally equivalent 
to adenine methylation in E. coli, as is the case for 

15 methylation-mediated transcriptional repression. This 
model is consistent with experimental evidence suggesting 
that, in monkey CV1 cells, cytosine hemimethylation at CpG 
sites may be a determinant of strandedness (Hare et al., 
(1985) Proc. Natl. Acad. Sci. 82:7350-7354). Since a nick 

20 in one of the DNA strands is capable of efficiently 
directing the mismatch repair in vitro, it is also possible 
that DNA termini generated at the replication fork 
represent the only strand-targeting signal in organisms 
lacking genome methylation such as Drosophila and S. 

25 cerevisiae . Accordingly, screening of the £. cerevisiae 
genome database did not identify any homologue of MED1. 
However, in CV1 cells, single-strand nicks were shown to 
synergize with CpG hemimethylation in directing repair, 
indicating that multiple mechanisms may play a role in 

30 strand determination. Thus, our data would imply that 
epigenetic modification of the genome via cytosine 
methylation not only participates in X-chromosome 
inactivation, imprinting, and transcriptional repression 
(A. Bird, Cell 70: 5-8, 1992; R. A. Martienssen and E* J. 

35 Richards, Curr. Opin. Genet. Dev. 5: 234-242, 1995), but is 
also involved in DNA repair. Indeed, recent studies 
propose that DNA methylation plays a role in maintaining 
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genomic stability (C. Lengauer, K. W. Kinzler, B. 
Vogelstein, Proc. Natl. Acad. Sci USA 94: 2545-2550, 
1997) ) . 

The interpretation of the MED1 mutational data 
5 requires some caution. Although it is presently unclear 
whether MED1 mutations promote or are the consequence of 
microsatellite instability, their apparent selection in 
tumors suggest that they may contribute to the unfolding of 
tumor genomic instability, as has been proposed for the 

10 MSH3 and MSH6 coding microsatellite mutations (M. Perucho, 
Nature Med 2: 630-631, 1996). Due to the variable amount 
of contaminating normal cells in primary tumor specimens , 
it is difficult to determine the homozygous or heterozygous 
nature of the MED1 mutations- Sequence analysis (Fig. 13) 

15 shows apparent retention in the tumors of the wild-type 
MED1 allele. This may indicate that the products of the 
mutant alleles, which lack the endonuclease domain (Fig. 
13C) , act in a dominant negative fashion, perhaps competing 
for methyl-CpG DNA binding. Alternatively, the 

20 heterozygous mutations may reduce the total amount of 
functional molecules (haploinsuf f iciency) . 

In summary, although the endonuclease domain of MED1 
does not display a significant homology to MutH, the 
specific interaction with hMLHl and the domain organization 

25 indicate that MED1 may be a functional homologue of MutH, 
i.e. the a DNA repair endonuclease capable of strand 
discrimination. Assuming MED1 is the long-sought 

eukaryotic homologue of mutH, then, like other mismatch 
repair genes which are mutated in HNPCC as well as in 

30 sporadic cancers with microsatellite instability, MED1 is 
a candidate gene for cancer genetic testing, both in HNPCC 
families and in sporadic cancers with microsatellite 
instability. It should be noted that only about 70% of 
HNPCC cases and only about 65% of sporadic tumors with 

35 microsatellite instability carry mutations in the known 
mismatch repair genes hMSH2, hMLHl, hPMS2 and hPMSl. The 
remainder 3 0-35% of the cases have an as yet unidentified 
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mismatch repair defect and a fraction may therefore harbor 
mutations or loss of expression of MED1. Indeed, 
frameshift MED1 mutations were detected in both colorectal 
and endometrial cancers. See Figure 13 and Table II. 
5 While certain preferred embodiments of the present 

invention have been described and specifically exemplified 
above, it is not intended that the invention be limited to 
such embodiments. Various modifications may be made to the 
invention without departing from the scope and spirit 
10 thereof as set forth in the following claims. 
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What is claimed is: 

1. An isolated double-stranded nucleic acid 
molecule which upon denaturation, specifically hybridizes 
with SEQ ID NO: 1, said nucleic acid molecule comprising a 

5 sequence encoding a human endonuclease about 580 amino 
acids in length, said encoded endonuclease comprising an 
amino- terminal methyl CpG-binding domain, an internal 
segment rich in positively charged amino acids and a 
carboxy- terminal catalytic domain, said catalytic domain 
10 having deoxyribonuclease activity. 

2. The nucleic acid molecule of claim 1, which 

is DNA. 



15 3. The DNA molecule of claim 2, which is a cDNA 

comprising a sequence approximately 2.4 kilobase pairs in 
length that encodes said human endonuclease. 

4. The DNA molecule of claim 2, which is a gene 
20 comprising introns and exons, the exons of said gene 
specifically hybridizing with the nucleic acid of SEQ ID 
NO: 1, and said exons encoding said human endonuclease 
protein. 

25 5* The nucleic acid molecule of claim 1, which 

is RNA. 

6. A vector comprising the nucleic acid molecule 
of claim 1. 

30 

7. A host cell comprising the vector of claim 6. 

8. The nucleic acid molecule of claim 1, wherein 
said nucleic acid encodes a human endonuclease protein 

35 comprising an amino acid sequence selected from the group 
consisting of an amino acid sequence encoded by SEQ ID NO: 
2 and natural allelic variants of said nucleic acid. 
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9. The nucleic acid molecule of claim 8, which 
comprises SEQ ID NO: 1. 

10. An isolated nucleic acid molecule comprising 
5 a sequence selected from the group consisting of: 

a) SEQ ID NO: 1; 

b) a sequence which specifically hybridizes 
with SEQ ID NO: 1; 

c) a sequence encoding a polypeptide of SEQ 

10 ID NO: 2; and 

d) a nucleic acid sequence encoding a 
catalytic domain of an endonuclease protein having an amino 
acid sequence corresponding to amino acids 455-580 of SEQ 
ID NO: 2. 

15 

11. An oligonucleotide between about 10 and 
about 200 nucleotides in length, which specifically 
hybridizes with a nucleotide sequence encoding amino acids 
of SEQ ID NO: 2. 

20 

12. An oligonucleotide between about 10 and 
about 200 nucleotides in length, which specifically 
hybridizes with a sequence in the nucleic acid molecule of 
claim 1, said sequence encoding the methyl CpG binding 

25 domain of said endonuclease protein. 

13. An isolated human endonuclease protein, 
about 580 amino acids in length, said encoded protein 
comprising an amino-terminal methyl CpG-binding domain, an 

30 internal segment rich in positively charged amino acids and 
a carboxy-terminal catalytic domain, said catalytic domain 
having deoxyribonuclease activity. 

14. An antibody immunologically specific for the 
35 isolated protein of claim 13. 

15. An antibody as claimed in claim 14, said 
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antibody being monoclonal. 

16. An antibody as claimed in claim 14, said 
antibody being polyclonal. 

5 

17. A pharmaceutical composition comprising a 
polypeptide as claimed in claim 13 and a pharmaceutically 
acceptable carrier. 

10 18. A pharmaceutical composition comprising an 

antibody as claimed in claim 14 and a pharmaceutically 
acceptable carrier. 

19. A method of diagnosing a susceptibility or 
15 predisposition to cancer in a patient caused by an 
alteration in a MED1 encoding nucleic acid, wherein said 
patient sample is analyzed by a method selected from the 
group consisting of: 

a) a method of comparing a sequence of nucleic 
20 acid in the sample with the MED1 nucleic acid sequence to 

determine whether the sample from the patient contains 
mutations; and 

b) a method of determining the presence, in a 
sample from a patient, of a polypeptide encoded by the MED1 

25 nucleic acid and, if present, determining whether the 
polypeptide is altered; and 

c) a method of DNA restriction mapping to 
compare the restriction pattern produced when a restriction 
enzyme cuts a sample of nucleic acid from the patient with 

30 the restriction pattern obtained from normal MED1 gene or 
from known mutations thereof; and 

d) a method employing a specific binding member 
capable of binding to a MED1 nucleic acid sequence, the 
specific binding member comprising nucleic acid 

35 hybridizable with the MED1 sequence; and 

e) a method wherein at least one antibody domain 
with specificity for an epitope selected from the group 
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consisting of a native MED1 nucleic acid sequence epitope, 
or a polypeptide epitope, the specific binding member being 
labelled so that binding of the specific binding member to 
its binding partner is detectable; and 
5 f) a method of PGR amplification involving one 

or more primers based on normal and mutated MED1 gene 
sequence to screen for normal and mutant MED1 gene in a 
sample from a patient • 

10 20. A method of identifying a target nucleic acid 

molecule in a test sample using a nucleic acid probe having 
the sequence shown in SEQ ID NO: 1, the method comprising 
contacting the probe and the test sample under hybridizing 
conditions and observing whether hybridization takes place. 

15 

21. A method according to claim 20 wherein the probe 
is used to identify a nucleic acid selected from the group 
consisting of a MED1 nucleic acid sequence and a mutant 
allele thereof. 

20 

22. A kit for detecting mutations in a MED1 gene 
associated with a susceptibility to cancer, the kit 
comprising at least one nucleic acid probe (s) capable of 
specifically binding a mutated MED1 nucleic acid. 

25 

23. A kit for detecting mutations in a MED1 gene 
associated with susceptibility to cancer, the kit 
comprising at least one antibody capable of specifically 
binding a polypeptide encoded by a mutated MED1 nucleic 

30 acid sequence. 

24. A kit comprising a pair of oligonucleotide 
primers having sequences corresponding to a portion of a 
nucleic acid sequence set out in SEQ ID NO: 1 for use in 

35 amplifying a nucleic acid selected from the group 
consisting of a MED1 nucleic acid sequence and a mutant 
allele thereof. 
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25. A kit for determining the presence of at least 
one mutation in a sample of nucleic acid from an 
individual, the kit comprising: 

a) a solid support having immobilized thereon at least 
5 one allelic variant specific nucleic acid probes having 

sequences corresponding to portions of the sequence set out 
in SEQ ID N0:1 capable of specifically binding a mutated 
MED1 nucleic acid sequence; and 

b) a detectable label for marking the presence of 
10 sample nucleic acid hybridized to the probe (s) . 

26. A kit for determining the presence of at least 
one mutation in a sample of nucleic acid from an 
individual, the kit comprising: 

15 a) a solid support having immobilized thereon at least 

one antibody capable of specifically binding a polypeptide 
encoded by a mutated MED1 nucleic acid sequence; and b ) 
b) a detectable label for marking the presence of 
antibodies bound to the sample polypeptides. 

20 

27. A method of screening for substances which 
modulate the activity of a MED1 polypeptide, the method 
comprising contacting at least one test substance with the 
MED1 polypeptide in a reaction medium, testing the activity 

25 of the treated MED1 polypeptide and comparing that activity 
with the activity of native, untreated MED1 polypeptide in 
a comparable reaction medium* 

28. A method as claimed in claim 27, wherein said 
30 test substance is a mimetic of the MED1 polypeptide. 

29. A chimeric animal comprising an exogenous MED1 
allele. 
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LexA-hMLH1 /B42-f5 
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LexA-myc / B42-f5 
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LexA-K-rev-1 /Kritl 
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1 G<XXXJCGTCTGGGGCGCTTTCGC AACA TTCAGACCTCGGTTGCAGCCCGGTGCCG TGAGCTGAAGAGGTTTC ACA TCTTACTCCGCCCC A 
91 CACCCTCC<XXn'TCOGCCC 

MGTTGLESLSLGD 



270 



181 CGCGGAGCTGCXXrCCACCGTCACCTCTAGTGAGCGCCTAGTC^ 

RGAAPTVTSSERLVPDPPNDLRKEDVAMEL 
271 GAAAGAGTGGGAGAAGATGAGGAACAAATGATGATAAAAAGU^GCAGTGA^ 360 

ERVGEDEEQMMIKRSSECNPLLCEPIASAQ 
361 TTlXXTKXrTACTGCAGGAAC^GAATGCCCT 450 

FGATA GTECRKSVPCGWERVVKQRLFGKTA 
451 GGAAGATTTGATCrrGTACTTTATCAGCCCACAAG<^ 540 

GRFDVYFISPQGLKFRSKSSLANYLHKNGE 
541 ACTTCTCTTAAG<X*GAAGATTra 630 

TSLKPEDFDFTVLSKRGIKSRYKDCSMAAL 
€31 ACATCCCATCTACAAAACCAAAGTAACAATTCAAACTGGA^ 720 

TSHLQNQSNNSNWNLRTRSKC KKDVFMPPS 
721 ACTA<mX»GAGTTGCAGGAOAGCA<»a^CTCTCT 810 

SSSELQESROLSHFTSTHLLLKEDEGVDDV 

811 AACTTCAGAAAGGTTAGAAAGCCCAAAGGAAAGGTGACTATTTTCAAAGGA^ 900 
NFRKVRKPKGKVTILKGIP IKKTKKGCRKS 

901 TGTCAOom ' T ^^ 990 
CSGFVQSDSKRESVCNKADAESEPVAQKSO 

991 CTTGATAGAACTGTCTGCATTTCTGATGCTGGAGCATGTGGT^ 1080 

LDRTVCISDAGACGETLSVTSEENSLVKKK 
1081 GAAAGATt^TTGAGTTCAGGATCAAATTTl'lV 1170 

ERSLSSGSHFCSEQKTSGIINKFCSAKDSE 
1171 CACAACGAGAAGTATCAGGATACCTTTTTAGAATCTGAAGAAATCGG^ 1260 

HNEKYEDTFLESEEIGTKVBVVERKBHLHT 
1261 GACATTTTAAAACGTXX3CTCTGAAATGGACAACAACTGCTCACCAA 1350 

DILKRGSEMDNNCSPTRKDFTOEKIFQEDT 
1351 ATCCXACGAACACAGATAGAAAGAAGGAAAACAAG<X^ 1440 

IPRTQIERRKTfi^YFSfiKYHKBALSPPRRK 
1441 CCCTTTAAGAAATG<JACACCTCCTC^^ 1530 

AFKKWTPPRSPFHLVOETI.FHDPHKLLIAT 
1531 ATATTTCTCAATCG<^CCTCAGGCAAAATGGC\ATACCTG 1620 

IFLHRTSGKMAIPVLHKFLEKYPSAEVART 
1621 GCAGACTajAGAGATGrGTCAOA^ 1710 

ADWRDVSELLKPLGLYDLRAKTIVKFSDEY 
1711 CTGACAAAGCAGTGXSAAGTATCCAATTGAGC^^ 1800 

tTKQWKYPIELHGIOKYGMDSYRIFCVMEW 
1801 AAGCAGGTGCACCCTGAAGACC^CAAATTAAATAAATATCATGACTGGCTTT^^ 1890 

KQVHPEDHKLNKYHDWL WENHEKLSLS* 

CAGCTTTCAAGCTC^TC^ 1980 
1981 TAATTA QCCCAACT AGAAGCCTAGTCTQTQTGCTTTCTTAATQTQT^ 2070 
2071 TTGAGATTTTTTTAAAATAAATTATTATTTTSACAACAAAAAAA 2152 
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FIGURE 5 
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FIGURE 6 
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FIGURE 7 A 
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FIGURE 7B 



WO 99/04626 



11 / 31 



PCT/US98/15828 





SCTSUWSfl/lM li /Zl 9Z9W/660M 



WO 99/04626 



13 / 31 



PCT/US98/15828 




Fig. 10A 



WO 99/04626 



14/ 31 



PCT/US98/15828 




CD 

JZ 
+-> 
CD 

E 
c 

3 



CD 
■♦— » 

JO 

4— ' 

E 



Fig. 10B 
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FIGURE 11A 
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FIGURE 11B 
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1 CAAGGAAGAT ATTGCTGTTG GACTGGGAGG AGTGGGAGAA GATGGAAAGG 
51 ACCTGGTGAT AAGCAGTGAG CGCAGCTCCC TTCTCCAAGA GCCCACTGCT 
101 TCTACTCTGT CTAGTACTAC AGCGACAGAA GGCCACAAGC CTGTCCCGTG 
151 TGGATGGGAA AGAGTTGTGA AGCAAAGGTT ATCTGGGAAA ACTGCAGGAA 
201 AATTTGATGT ATACTTTATC AGCCCACAAG GATTGAAGTT CAGATCAAAA 
251 CGTTCACTTG CTAATTATCT TCTCAAAAAT GGGGAGACTT TTCTTAAGCC 
301 TGAAGATTTT AATTTTACTG TACTGCCGAA AGGGAGCATC AATCCCGGTT 
351 ATAAACACCA AAGTTTGGCA GCTCTGACTT CCCTGCAGCC AAATGAAACT 
401 GACGTTTCAA AGCAGAACCT CAAGACACGA AGCAAGTGGA AAACAGATGT 
451 GTTGCCTCTG CCCAGTGGTA CTTCAGAGTC GCCAGAAAGC AGCGGACTGT 
501 CTAACTCTAA CTCGGCTTGC TTGCTATTGA GAGAACATAG GGACATTCAG 
551 GATGTTGACT CTGAGAAGAG G AG AAAGTC C AAAAGAAAGG TGACTGTTTT 
601 GAAAGGAACT GCAAGTCAGA AAACCAAACA AAAGTGCAGG AAGAGTCTCT 
651 TAGAGTCTAC TCAAAGAAAC AGAAAAAGAG CATCTGTGGT TCAGAAGGTG 
701 GGTGCTGATC GCGAGCTGGT GCCACAGGAA AGTCAACTCA ACAGAACCCT 
751 CTGCCCTGCA GATGCCTGTG CAAGGGAGAC TGTTGGCCTG GCTGGGGAAG 
801 AAAAAT C ACC AAGCCCAGGA CTGGATCTTT GTTTCATACA AGTAACTTCT 
851 GGCACCACAA AC AAATT CCA TTCAACTGAA GCAGCAGGTG AAGCAAATCG 
901 TGAGCAGACT TTTTTAGAAT CAGAGGAAAT CAGATCGAAG GGAGACAGAA 
951 AGGGGGAGGC ACATTTGCAT ACTGGTGTTT TACAGGATGG CTCTGAAATG 
1001 CCCAGCTGCT CACAAGCCAA GAAACACTTT ACTTCTGAGA CATTTCAAGA 
1051 AGACAGCATC CCACGGACAC AAGTAGAAAA AAGGAAAACA AGCCTGTATT 
1101 TTTCCAGCAA GTACAACAAA GAAGCTCTTA GCCCCCCAAG ACGCAAATCC 
1151 TTCAAGAAAT GGACCCCTCC TCGGTCACCT TTTAATCTTG TTCAAGAAAT 
1201 ACTTTTCCAT GACCCATGGA AGCTCCTCAT CGCGACTATA TTTCTCAATC 
1251 GGACCTCAGG CAAGATGGCC ATCCCTGTGC TGTGGGAGTT TCTAGAGAAG 
1301 TACCCTTCAG CTGAAGTGGC CCGAGCTGCC GACTGGAGGG ACGTGTCGGA 



Fig. 17A 
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1351 GCTTCTCAAG CCTCTTGGTC TCTACGATCT CCGTGCAAAA ACCATTATCA 

1401 AGTTCTCAGA TGAATATCTG ACAAAGCAGT GGAGGTATCC GATTGAGCTT 

1451 CATGGGATTT GGTTAAAATA TGGCAACGAC TCTACCGGAT CTTTTGTGTC 

1501 AATGAATGGA AC AG 



Fig. 17B 
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mouse MED1 protein (upper sequence) x human MED1 protein 
(lower sequence) 



1 KEDIAVGLGGVGEDGKDLVI . . SSERSSLLQEPTAST . LSSTTATEGHKP 47 

|||:|. I MM . :.| Ml . MMI M -I 1 1 I oc 

3 6 KEDVAMELERVGEDEEQMMIKRSSECNPLLQEPIASAQFGATAGTECRKS 85 
48 VPCGWERWKQRLSGKTAGKFDVYFISPQGLKFRSKRSLANYLLKNGETF 97 

MMMMMMI II I ! hi 1 1 Ml I II I ! M 1 1 1 IMMI MMI ^ 

86 VPCGWERVVKQRLFGKTAGRFDVYFISPQGLKFRSKSSLANYLHKNGETS 135 

98 LKPEDFNFT\n^PKGSINPGYKHQSLAALTSLQPNETDVSKQNLKTRSKWK 147 

IIIIIUMI I I I I Ml II II h.. I 11-1111 I , oc 
13 6 LKPEDFDFTVLSKRGIKSRYIOX^SMAALTSHLQNQSNNSNWNLRTRSKCK 185 

148 TDVLPLPSGTSESPESSGLSNSNSACLLLREHRDIQDVDSEKRRKSKRKV 197 

II II Ml II MM I MMI = II- I II I II _ 

186 KDVFMPPSSSSELQESRGLSNFTSTHLLLKEDEGVDDVNFRKVRKPKGKV 235 
198 TVLKGTASQKTKQKCRKSLLESTQRNRKRAS 228 

hill *llh MM I - II I 

236 TI LKGI PIKKTKKGCRKSCSGFVQSDSKRESVCNKADAESEPVAQKSQLD 285 



229 EDSIPRTQVEKRKTSLYFSSKYNKEALSPPRRKSF 263 

I U 1 1 IhhM 1 1 1 1 1 1 1 II 1 1 Ml 1 1 1 II U _ c 

386 CS PTRKDFTGEKI FQEDT I PRTQ I ERRKT S L YF S SKYNKEALS P PRRKAF 435 

• • • 

264 KKOTPPRSPFNLVQEILFHDPWKLLIATIFLNRTSGKMAIPVLWEFLELY 313 

II MMMMMMI 1 1 II I Mill IMMI I II II 1 1 II II h II I I . 

436 KKWT PPRS PFNLVQETLFHDPWKLL I AT I FLNRTSGKMAI PVLWKFLEKY 485 
314 PSAEVARAADWRDVSELLKPLGLYDLRAKTIIKFSDEYLTKQWRYPIELH 363 

1 1 II III IMMI II II II I Mil INI I hi II II 1 1 IN hi Mill 

486 PS AEVARTADWRDVSELLKPLGLYDLRAKT I VKFSDEYLTKQWKYP I ELH 535 

364 GIWLKYGNDSYRIFCVNEWKQ 384 

II I I M M I I I II I I I I I I 
536 GIG . KYGNDSYRIFCVNEWKQ 555 



Figure 18 
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Exon 2 

ggttttgttttt c cagCAAGGAAGATATTGCTGTTGGACTGGGAGGAGTG 
GGAGAAGATGGAAAGGACCTGGTGATAAGCAGTGAGCGCAGCTCCCTTCT 
CCAAGAGCCCACTGCTTCTACTCTGTCTAGTACTACAGCGACAGAAGGCC 
ACAAGCCTGTCCCGTGTGGATGGGAAAGAGTTGTGAAGCAAAGGTTATCT 
GGGAAAACTGCAGGAAAATTTG ATGTATACTTTATCAGgt aagca 1 1 1 ag 
gaaggaaaata 



Exon 3 

ctttttttttttcctttt aagCCC AC AAGGATTGAAGTTC AGATC AAAAC 

GTTCACTTGCTAATTATCTTCTCAAAAATGGGGAGACTTTTCTTAAGCCT 

GAAGATTTTT^ATTTTACTGTACTGCCGAAAGGGAGCATCAATCCCGGTTA 

TAAACACCAAAGTTTGGCAGCTCTGACTTCCCTGCAGCCAAATGAAACTG 

ACGTTTCAAAGCAGAACCTCAAGACACGAAGCAAGTGGAAAACAGATGTG 

TTGCCTCTGCCCAGTGGTACTTCAGAGTCGCCAGAAAGCAGCGGACTGTC 

TAACTCTAACTCGGCTTGCTTGCTATTGAGAGAACATAGGGACATTCAGG 

ATGTTGACTCTGAGAAGAGGAGAAAGTCC7VAAAGAAAGGTGAOTG 

AAAGGAAC TGC AAGTCAGAAAACC AAAC AAAAGTGCAGG AAGAGTCTCTT 

AGAGTCTACTCAAAGAAACAGAAAAAGAGCATCTGTGGTTCAGAAGGTGG 

GTGCTGATCGCGAGCTGGTGCCACAGGAAAGTCAACTCAACAGAACCCTC 

TGCCCTGCAGATGCCTOTGCAAGGGAGACTGTTGGCCTGGCTGGGGAAGA 

AAAATCACCAAGCCCAGGACTGGATCTTTGTTTCATACAAGTAACTTCTG 

GCACCACAAACAAATTCCATTCAACTGAAGCAGCAGGTGAAGCAAATCGT 

GAGCAGACTTTTTTAGAATCAGAGGAAATCAGATCGAAGGGAGACAGAAA 

GGGGGAGGCACATTTGCATACTGGTGTTTTACAGGATGGCTCTGAAATGC 

CCAGCTGCTCACAAGCCAAGAAACACTTTACTTCTGAGACATTTCAAGgt 

actcagtgcatgaaaa 



figure n 
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F 



Exon 4 

gactataaactaattttgcttctcagAAGACAGCATCCCACGGACACAAG 
TAGAAAAAAGGAAAACAAGCCTGTATTTTTCCAGCAAGTACAACAAAGAA 
Ggtacccacctttccctaagc 



Exon 5 

tatatttntgnagCTCTTAGCCCCCCAAGACGCAAATCCTTCAAGAAATG 
GACCCCTCCTCGGTCACCTTTTAATCTTGTTCAAGAAATACTTTTCCATG 
ACCCATGGAAGCTCCTCATCGCGACTATATTTCTCAATCGGACCTCAGg 
t t ngggg t c a t t gnc a t 



Exon 6 



tgt t tatgc t c cccagGCAAGATGGCCATCCCTGTGCTGTGGGAGTTTCT 
AGAGAAGTACCCTTCAGCTGAAGTGGCCCGAGCTGCCGACTGGAGGGACG 
TGTCGGAGCTTCTCAAGCCTCTTGGTCTCTACGATCTCCGTGCAAAAACC 
ATTATCAAGTTCTCAGgtatgtccccagcccag 



Exon 7 

t gga t g t gtatccctc ag ATGAAT ATCTGAC AAAGC AGTGGAGGTATCCG 
ATTGAGCTTCATGGGATTTGGTTAAAATATGGCAACGACTCTACCGGAT 
CTTTTGTGTCAATGAATGGAACAGgt aag cc cac cac t gggg c c 



FIGURE V\ cant. 
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Exon 1 

GCGGCGGCGTCTGGGGCGCTTTCGCAACATTCAGACCTCGGTTGCAGCCCGGTGCCGTGAGCTGAA 
GAGGTTTCACATCTTACTCCGCCCCACACCCTGGGCGTTGCGGCGCTGGGCTCGTTGCTGCAGCCG 
GACCCTGCTCGATGGGCACGACTGGGCTGGAGAGTCTGAGTCTGGGGGACCGCGGAGCTGCCCCCA 
CCGTCACCTCTAGTGAGCGCCTAGTCCCAGACCCGCCGAATGACCTCCGgtaagttactgtcccct 
tttgggcctcagtttcaccacctgtaaaatggtatcgggagagtggacagtgtgtgggcctttcta 
acctttgacagagggtcggcanaaacctcgaagcccacgggtttagttactagggtctggagccca 
ggtgctcttcctgtgcgatcagc . . . 



Exon 2 

. . . tttgaaagacaaaaaat actcccatacrcacaaCTactaa tccacactaactttaatctccc 

tcattttaatatggataatctatgtggttcctgcattgtcatggattaaaactgagtaggcagtgg 

aagataaattttaaataagttaatcacttagactttgtttttccagCAAAGAAGATGTTGCTATGG 

AATTGGAAAGAGTGGGAGAAGATGAGGAACAAATGATGATAAAAAGAAGCAGTGAATGTAATCCCT 

TGCTACAAGAACCCATCGCTTCTGCTCAGTTTGGTGCTACTGCAGGAACAGAATGCCGTAAGTCTG 

TCCCATGTGGATGGGAAAGAGTTGTGAAGCAAAGGTTATTTGGGAAGACAGCAGGAAGATTTGATG 

TGTACTTTATCAGgtaagcatataagatggtaaagatagtacagccaaatgattttgtctggg^aa 

ataqtaaqaacataqc aaaaatcttaacttctttatatttttaccataaaaccattqcaqattc 

tattctttcaatgttgctattaattacatcaagtgatttggggaaaattacatacattttgtccct 

ccttctgtgaatggttaacgggtaggttgcattttagttatatttataaatttatattgtcataga 

ggaaccatttaaaaggccattatcactctttttcatttttaaatgacagagacctatggcaacatt 

tggaaattaattagaatctgaaatgtggtccagttcttttaaaagtcccttctatttactagcagt 

aagtttcctttaatatcattttctag(continues into exon 3, see below) 



FIGURE 20 
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aatctgaaatgtggtccagttcttttaaaagtcccttctatttactaffGaqtaagtttccttt 

aatatcattttctagCCCACAAGGACTGAAGTTCAGATCCAAAAGTTCACTTGCTAATTATCTTCA 

CAAAAATGGAGAGACTTCTCTTAAGCCAGAAGATTTTGATTTTACTGTACTTTCTAAAAGGGGTAT 

CAAGTCAAGATATAAAGACTGCAGCATGGCAGCCCTGACATCCCATCTACAAAACCAAAGTAACAA 

TTCAAACTGGAACCTCAGGACCCGAAGCAAGTGCAAAAAGGATGTGTTTATGCCGCCAAGTAGTAG 

TTCAGAGTTGCAGGAGAGCAGAGGACTCTCTAACTTTACTTCCACTCATTTGCTTTTGAAAGAAGA . 

TGAGGGTGTTGATGATGTTAACTTCAGAAAGGTTAGAAAGCCCAAAGGAAAGGTGACTATTTTGAA 

AGGAATCCCAATTAAGAAAACTAAAAAAGGATGTAGGAAGAGC 

TAGCAAAAGANAATCTGTGTGTAATAAAGCAGATGCTGAAAGTGAACCTGTTGCACAAAAAAGTCA 

GCTTGATAGAACTGTCTGCATTTCTGATGCTGGAGCATGTGGTGAGACCCTCAGTGTGAGCAGTGA 

AGAAAAOTGCCTTGTAAAAAAAAAAGAAAGATCATTGAGTTCAGGATCAAATTTTTGTTCTGAACA 

AAAAACTTOTGGCATCATAAACAAATTTTGTTCA 

GGATACCTTTTTAGAATCTGAAGAA 

GCATACTGACATTTTAAAACGTGGCTCTGAAATGGACAACAACTGCTCACCAACCAGGAAAGACTT 
CACTGgtgagaaaatatttcaaggtatccagtgctttcagcactattaaacattagtgatccaaaa 

atttatatqctacatc tqtatcgtgccatac 

Please note: at the end of exon 3, two alternative splice donor 
sites are present (see Sequence Variations, page 40 of the 
application) . 



Exon 4 and Exon 5 

tagtaccaagttcatgggtcattagttagattaattgggtatttatgtaaagggcttagaatagtg 
rctggrnta ^i-f-^ a f- aa t- ag hgh^gah a t-hathahttacatccctcaatattqctttaagcta 
aaccatagactccataaagtgtttacttttccttttcagAAGATACCATCCCACGAACACAGATAG 
AAAGAAGGAAAACAAGCCTGTATTTTTCCAGCAAATATAACAAAGAAGgtatccctttcccaatca 
gaacagcaaattctaattccattttgggttttcaattctgatgcactatgtttgtttagCTCTTAG 

CCCCCCACGACGTAAAGCCTTTAAGAAAT^ 

AACACTTTTTCATGATCCATGGAAGCTTCTCATCGCTACTATATTTCTCAATCGGACCT 1 1 

gcctgggttcaaagtcattttgagtgtgtcacctgggatagggcattccccctttcacccttaaac 
tcttcacctatgaggaaaatggggg 



FIGURE 20 Conk 1 
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Exon 6 

ccagtgttttttgttttttgttttctttaaaaaaaaaaaaaaaccctctggatgagatttctatga 
QaaactacttaaacQtaaaatc aacccacctaaaatcttataa tcattcaatt-.arhfttannt-trr 

cagGCAAAATGGCAATACCTGTGCTTTGGAAGTTTCTGGAGAAGTATCCTTCAGCTGAGGTAGCAA 
GAACCGCAGACTGGAGAGATGTGTCAGAACTTCTTAAACCTCTTGGTCTCTACGATCTTCGGGCAA 
AAACCATTGTCAAGTTCTCAGgtattttcctatacacccaaaggaaaaacataatacattgtgctt 
atttaa aaffaqccacaccttaaacttt aatattctcaaatactabattaatggagghffft-ra 

gctcaagcatttaaaaaagtccacttttccccaaaccacagtctcccactgacctaaacaataaat 
cttt 

Exon 7 

^fj-f- aqa » q^^rT a Gct:qat:aatataq aatqttcTtattcttcagATGAATACCTGACAAAGCAG 

TGGAAGTATCCAATTGAGCTTCATGGGATTGGTAAATATGGCAACGACTCTTACCGAATTTTTO 

GTCAATGAGTGGAAGCAGgtgaggctcactcccatccataattcagcacatt ^ffgtctctffaqq 

caaaataag tccaccattatggttaagacnatttattggggatacaaatgctattacagtcacaa 

caattgtgttcctggctgcggggaagcgngtggcatgtgggttttggggtttttgatcagtaggcg 

ctcccagg 
Exon 8 

tgtgtgagattaccttaatataaggtataacttaaaatattcatgaatcccaggaggttaaaggtt 

g f a ^^hh a gg^ a hq g tatGqtaa t-rTfcacfc^cccccaacaaacattt 

aaaaaatgtatttctgactaagttacatntaaggtctctgcctctgtatcttatgtttcttccagG 

TGCACCCTGAAGACCACAAATTAAATAA^^ 
GTTTATCTTAAACTCTGCAGCTTTCAAGCT 
TAATTAAGTACAACCAACCACCTTTCCAGCCATAGAGATTOT 
GTGTGTGCTTTCTTAATGTGTGTGCCAATGGTGGAT 

TGAGATTTTTTTAAAATAAATTATTATTTGACAACA*atccaaaaaaaatacggcttttcca 
tgaaatataatcagaagatgaaaaatagttctaaactatcaataatacaaagcaaatttctatca 

crGcttqctaaaac taaaqqcccactaaatatttt 
Please note : asterisk indicates the poly(A) addition site. 



FIGURE 20 Corvt. Z 
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Complete sequence of the intron between exon 7 and exon 8- 

GGAAGCAGgtgaggctcactcccatccataattcagcacatttggtctctgaggcaaaataagtcc 
accattatggttaagactatttattggatacaaatgctattacagtcacaaacaattgtgttcctg 
gctgcggggaagcgagtggcatgtgggttttggggtttttgatcagtaagcgctcccaagtccaca 
aagaccagtccagcggcgtggcctctgactcatctccagtggtttgtcacctctggccctgttcct 
gtcattccctatttgtgtgctatctctaagcctgacgtggttttcctcctgtcaaaagtacaccac 
tacaggaaagcaggaaggtttgggccttgcaatgtatgcatattgggtttctcttagtggtctcag 
actacgtttgtggtgactgggtcctgcttcagccctgttgaatatgcccagcctgtggcatgctgg 
tggtcatcctggcagctggtgggtggcctggtatgctgcccactcagcttgagactcaccctcatg 
cattcagccagtaggtctggccaagcctgaactgaaggaccatggtcctatcccagcttcatcaca 
gcaatccattgtgacctgagaatccatttaacctctcggtctagaacctccttctggaaagtgagg 
tattaatacttgactcaatgttatcgccaccccacattctaagtcatggttgagtagtaatttgga 
cagtaccttgtaaattgtgtgagattaccttaatataaggtataacttaaaatattcatgaatccc 
aggaggttaaaggttataacttttaggtatggtatcgtaatgtactgtcccccagcaaacatttaa 
aaagccaattttaaaaaatgtatttctgactaagttacattaaggtctctgcctctgtatcttatg 
tttcttccagGTGCACCC 



FIGURE 20 
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